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Preface 


This text is an introductory course in real analysis, intended for students who 
have completed a calculus sequence. It is also designed to serve as preparation 
for advanced mathematics courses of many sorts. Though there are occasional 
references to it in exercises, linear algebra is not specifically a prerequisite for 
this text. Nevertheless, the changing role of linear algebra in the undergraduate 
curriculum is one of the main reasons this book comes to be the way it is. In the 
past, a first course in linear algebra was generally considered to be the place 
where one "learned to do proofs." The mathematics curriculum has gradually 
changed, though, and proofs as such are no longer the main focus of the typical 
linear algebra course. As a result, a student's first extensive experience with the 
logical and organizational skills necessary for the successful construction of 
proofs is often delayed until they find themselves in courses in which success is 
predicated on possession of those very skills. 


Textbooks have been slow to adapt to these changes. This book provides a 
pathway from the calculus course to real analysis (and beyond) in which the 
discussion of the construction of proofs is a continuing and central theme. 
Throughout the text (but most especially in Part 2), proofs are not simply 
presented in final form. Rather they are shown as works-in-progress; they are 
built, and their construction is discussed along with their final content. While 
learning real analysis, the student will, it is hoped, also learn something about 
the workings of the mind of a mathematician—invaluable information if one is 
to become a researcher in or a teacher of mathematics, or both, but information 
that is often overwhelmed by the demands of the subject at hand. 


The text is organized into four parts. The material of Part 1 is the common 
foundation of most upper-level mathematics courses. The book begins with an 
introduction to basic logical structures and techniques of proof. The ideas 
introduced here, especially the crucial "forward-backward method" of 
constructing proofs, are all emphasized and used explicitly throughout the text. 
The rest of Part 1 includes discussions of the concept of cardinality, the algebraic 
and order structures of the real and rational number systems, and the natural 
numbers in their dual role as the basis for induction and as special elements in 
ordered fields. The discussion of the real and rational number systems sets the 


stage for Part 2 of the book. In Part 1 it is found that these systems have much in 
common, while Part 2 is devoted to discovering how they are different. 


Part 2 of the text is an in-depth examination of the completeness of the real 
number system in its various guises and a discussion of the topological structure 
of the real number system. It is here also that the discussion of the construction 
of proofs becomes most focused. The student is regularly reminded of the 
relationship between the topic at hand and the larger story by a graphic device 
called the "Big Picture." The Big Picture, which describes interrelationships 
among the various manifestations of the completeness of the real number 
system, gives a structure and unity to the subject not found in most mathematics 
courses. The organization of Part 2 is such that after its first four chapters are 
discussed, the rest may be treated in any order and at the discretion of the 
instructor. The structure provided by the Big Picture allows the option of 
showing that the important properties of the real number line do not exist in 
isolation; they are in fact equivalent. 


Part 3 of the text is a review and extension of calculus in light of the student's 
new understanding of the real number system. With the knowledge and 
experience gained in Part 2, the student can appreciate the structures and results 
of calculus to a depth not possible in a first-year course, and is prepared to deal 
with proofs presented in a more direct and traditional manner. Part 4 is a 
selection of topics in real function theory, investigated as natural outgrowths of 
questions readily understood by the student. The instructor has great latitude to 
choose topics in the second half of the text, and coverage of this material can 
range from an intensive reconstruction of calculus to a discussion approaching 
current research topics. 


The exercises in the text are many and varied. They range from the fairly 
routine—checking steps omitted from examples and observing mechanical 
results at work—to completion of partial proofs from the text to extended 
projects, some of which border on current research. There are questions in which 
the student is asked simply to discuss a statement, or to explain to their own 
satisfaction why something is true or false, free from the restrictions of a formal 
proof. There are questions in which the student is asked to find flaws in incorrect 
(though possibly convincing) "proofs," and questions in which the student is 
asked to reconsider their own proofs in light of new information. Most 
importantly, the exercises are an integral part of the text. Regular and meaningful 
cross-referencing reminds the student of the unity of the subject and highlights 
their own active role in its development. Furthermore, the exercises themselves 
constitute part of the ongoing study of the workings of the mathematical mind, 


as the student is often led from one topic to another in a way that suggests, it is 
hoped, that no matter how many answers one finds, there are always more 
questions to be asked. 


The diagram below may be interpreted like this: The overall flow of the 
subject is from top to bottom. The material of Chapter 1 supports everything 
else, and the ideas in Chapter 2 pop up just about everywhere. Strong 
dependence between chapters is indicated by a line, though ideas from a chapter 
represented by a higher ball might be needed in one below it (spatially or 
structurally). For instance, it is necessary to understand the material of Chapter 4 
to make sense of Chapter 5, but Chapter 4 is less directly needed in Chapter 12 
and even less so in Chapter 20. Aside from experience, there are no prerequisites 
for Chapters 14 and 19, and Chapter 22 may be taken up any time after Chapter 
5 (hence its variable position). Part 2 of the book (Chapters 5 through 12) has an 
internal structure of its own, which is described in the introduction to that 
section. Gratitude, like proofs, should sometimes follow the forward-backward 
method. I am eternally grateful to Professor Ronald Shonkwiler of the Georgia 
Institute of Technology for starting me on my way to becoming a mathematician, 
and to Professor Daniel Waterman and the late David Williams of Syracuse 
University for doing their best to help me finish the journey. In the present, I am 
indebted to Jim Spencer of the University of South Carolina at Spartanburg and 
Robert E. Zink of Purdue University, and others whose identities I will never 
know, for their careful and thoughtful reviews of the manuscript of this book. 
Their suggestions have much improved the end result. For the future, I wish to 
thank the students who have been so tolerant during the development of this text. 
They have, with the unerring radar available only to those to whom a subject is 
new, corrected shocking slews of errors, and have, for the most part, helped me 
overcome my tendency to write questions in which it is necessary to do part (b) 
before part (a). They have also, much as was hoped, asked lots of questions, and 
many of the exercises in the text were proposed by students in the course. It can 
be said with equal validity in regard to all three of these groups of people, that 
most of what is good here is theirs, while the remaining errors and oversights are 
mine alone. 


CHAPTERS 1 AND 2 


Part One 


Preliminaries 


We begin our work with a discussion of the construction of proofs. Writing 
proofs, of course, is the heart of mathematics. If there were always direct, 
mechanical processes for writing proofs, though, mathematics would not be 
nearly so fascinating. We will find that proofs need not be as mysterious as they 
might seem and that we can smooth the way considerably by making use of 
some basic organizational schemes. 


The rest of the first half of the book is best understood by thinking of what 
we will call the Big Question: How are the real numbers different from the 
Rational Numbers? We will tackle this in two stages. In Part | of the book we 
will see some of the ways these two number systems are similar. It's best to 
know the ways things are alike before asking how they are different. In Part 2 of 
the book, we examine the differences between them and answer the Big 
Question. Our efforts will be richly rewarded. We will find that the property that 
distinguishes the real number system from the rational number system is 
precisely what makes calculus work. 


Through all of this we will never say what the real numbers actually are! 
That we can consider working this way is one of the remarkable features of 
mathematics. We can study how the real numbers work, blissfully unconcerned 
with what they are. We can solve the crime, so to speak, without ever knowing 
the suspects. We will finally meet the real numbers at the very end of the book. 
Agatha Christie would be proud.! If we don't even know what they are, how can 
we hope to say that the real numbers are different from the rational numbers? 
Like this: If an object XY possesses some mathematical property that the object Y 
does not, we can say with confidence that_X is different from Y. If _X is definitely 
red, Y is definitely blue, and (this is most important) a red thing can't also be 
blue, then XY and Y must be different. This sort of argument underlies much of 
what happens in this book. Be sure to watch for it. 


1 Agatha Christie annoyed faithful readers of her wonderful mysteries for decades by revealing 
essential clues only in the final scene ("I suppose you're wondering why I've called you all here ..."). 


Chapter 1 


Building Proofs 


1.1 A QUEST FOR CERTAINTY 


The study of mathematics is the quest for a sort of certainty that can be attained 
in no other endeavor. In mathematics we can "prove" things. But what does this 
mean? Less than we might hope. Bertrand Russell, one of the foremost British 
philosophers of recent times, called mathematics "the subject in which we never 
know what we are talking about, nor whether what we are saying is true." If this 
is the case, how can we hope to prove anything? We can't! What we can do, 
however, is show with absolute certainty that each of a chain of statements is "as 
true as those before it." If we believe the statements at the beginning of the 
chain, and that the chain is properly assembled, we must believe the statements 
at the end.! 


Of course, we have to begin somewhere, and it is evident that statements at 
the beginning of such a chain can't be proved. Statements that we agree to accept 
without proof are called axioms. We may discuss whether an axiom is 
appropriate (that is, whether it describes life as we perceive it) and we might at 
some point want to spend time discussing which axioms we ought to believe and 
which we should reject. But once this issue has been settled (and for the 
purposes of this course we consider it to be so) we agree not to discuss whether 
an axiom is true or false. Though it certainly can be an activity of great value, it 
is not our goal to scrutinize a collection of axioms here. We are studying the "top 
floors" of a subject, not its "foundations." Besides, the chain of reasoning 
leading from the most basic axioms to this text is unimaginably long. Bertrand 
Russell and Alfred North Whitehead took it upon themselves to build such a 
chain in their monumental Principia Mathematica. After several hundred pages, 
they were able to prove from "first principles" that 1 + 1 = 2. Fortunately, this is 
not how we will be spending our time. Foundational questions are as much 
philosophical as mathematical, and as mathematics they fall under headings 
other than analysis. 


EXERCISES 1.1 


1. What did Russell mean when he said that mathematics is "the subject in 
which we never know what we are talking about, nor whether what we are 
saying is true"? 


2. Discuss the differences in meaning of a statement of the form "It is true" 
when the assertion is made by a mathematician, a physicist, a biologist, a 
sociologist, a politician, or a used-car salesperson. 


1.2 PROOFS AS CHAINS 


It is instructive in many ways to view a proof as a chain of reasoning. To build a 
chain, we need a supply of links and a way to connect one link to another. In 
geometry class, we sometimes made two-column proofs with "Steps" on one 
side of the paper and "Reasons" on the other. A step might have been "Angle a is 
congruent to angle b," with the reason "Alternate interior angles." Steps are the 
links in the chain; reasons are the connections between them. We can safely use 
a real chain only if each of its links and the connections between them are sound. 
In mathematics, a sequence of statements, each of which is properly formed and 
correctly justified by those before it, is called a proof. 


We can construct a chain from either end, or from both ends at once. We can 
even assemble links bound for the middle into sections and then connect the 
sections. In the same way, we can work a proof from the beginning or the end (or 
even from the middle). A proof is almost never thought of straight through from 
beginning to end. In textbooks, though, proofs are usually written down from 
beginning to end, causing much unnecessary confusion. Here we will give a 
brief outline of the basic logical structures we will encounter in our work. We 
observe right off the bat that even the simplest of ideas sometimes warrant 
discussion. 


1.3 STATEMENTS 


We have already used this important word, even though we may not be entirely 
sure what it means. Since statements are the steps in our proofs—the links in our 
chains—we should examine the meaning of the word carefully. Unfortunately, it 
is a bit difficult to capture, and the results might be a bit unsatisfying: A 
statement is a grammatically meaningful sentence to which one or the other (but 
not both) of the words "true" or "false" can be attached. The appropriate one of 
these is called the truth value of the statement. We see that "1 + 1 = 2" isa 


statement, since it may be labeled "true," and that "1 + 1 = 3" is a statement 
because it is "false." A collection of words to which no truth value can be 
attached is simply not a statement. For instance, consider the phrase "This 
sentence is false." If we believe this to be true, then the assertion it seems to 
make is true, and consequently it is false. On the other hand, if we believe it to 
be false, the assertion it seems to make is false, and so the sentence must be true! 
However we view it, we are led to conclude that the phrase is both true and false, 
which cannot be. We resolve this paradox by saying that "This sentence is false" 
is not a statement. (We have hedged our bets by saying that the sentence seems 
to make an assertion. Since it is not a statement, it can't make any assertion at 
all.) 


The discussion of which collections of words are statements and which are 
not is another subject, and we won't go into it here. It will be enough for our 
purposes to note that a statement must assert something. This means, among 
other things, that a statement must contain a verb (in mathematics the verb is 
often =). Here is a very simple (but remarkably useful) preliminary test to check 
whether a collection of words is a statement: 


IF IT DOESN'T MAKE SENSE AS LANGUAGE, IT DOESN'T MAKE 
SENSE AS MATHEMATICS. 


The very best way to check this is to read what you write, preferably out loud. If 
your writing doesn't sound meaningful, it isn't. Many of what seem to be errors 
in understanding are actually only errors in grammar. This principle is most 
often violated in the writing of sentences that make no assertion. Sentences with 
no verbs! 


EXERCISES 1.3 


1. Decide whether the following are statements: 
(a)2+4=7. 
(b) sin? x + cos? x = 1. 
(c) This sentence no verb. 


(d) The sequence of digits 0123456789 appears somewhere in the decimal 
expansion of z. (*) 


1.4 CONNECTIVES 


There is a limit to the depth of ideas that can be expressed in statements like "1 + 
1 = 2" and "A pencil is a writing utensil" (these are called simple statements). 
More interesting are compound statements like "Either 1 + 1 = 2 ora pencil is a 
useful tool in neurosurgery" and the more mundane "A subset of the real line is 
closed if and only if its complement is open." 


The tools with which we make compound statements out of simpler ones are 
called connectives, of which we need to consider only four. The simplest 
connective is negation: If "A" is a statement, so also is "not A," which is 
sometimes denoted —A (we will use the word instead of the symbol). Since 4 is a 
statement, it has a truth value. The statement "not A" is assigned the truth value 
not given to A. So "1 + 1 = 2" is a (true) statement whose (false) negation is "not 
(1 + 1 =2)." Here we could also say "1 + 1 #2," but it often takes some effort to 
phrase the negation of a statement in ordinary language. (Since it doesn't really 
"connect" anything, negation is often referred to as a modifier rather than a 
connective.) 


Our first genuine connective is conjunction. If A and B are statements, their 
conjunction is the statement "4 and B" (or "4 A B"). We may describe the truth 
values of "4 and B" with a truth table like the one below: 


B|Aand B 
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All combinations of truth values of the two statements A and B appear in the first 
two columns of this table. The last column tells us that "4 and B" is true only 
when A and B are both true. This agrees with our usual understanding of the 
word. (But the meaning of "and" is being defined in this table. It need not 
coincide with ordinary usage, though it is all the better if it does.) If we assert 
that "it is warm and sunny," we expect it both to be warm and to be sunny. By 
specifying truth values of the statement "4 and B," we are also asserting that "4 
and B" is a statement as long as 4 and B are statements. 


The next connective is disjunction. The disjunction of A and B is written "A 
or B" (or "A V B"), and is given by: 


The mathematical "or" is not quite the same as the grammatical "or." In ordinary 
usage, "or" can mean "one or the other or both" and it can mean "one or the other 
but not both." The former is called inclusive disjunction and the latter exclusive 
disjunction. In mathematics "or" always refers to inclusive disjunction. 
Exclusive disjunction is not uncommon in everyday language ("You can eat your 
lima beans or you can skip dessert"), but we encounter it so seldom in 
mathematics that we don't have a separate term for it. 


The most important connective is implication. We use implication to join the 
links in the "chains" that constitute our proofs. We write "4 = B" and say "A 
implies B." Here A is called the hypothesis and B is the conclusion.° 
Implication is defined by this table: 


This deserves more discussion. We may think of an implication as a rule that is 
true if it is being obeyed and false if it is being broken: 


IF IT IS RAINING, THEN YOU MUST CARRY AN UMBRELLA (RAIN = 
UMBRELLA) 


Suppose it's raining and you're carrying your umbrella. You are not breaking the 
rule, and all is well (because "true = true" is true). If it's raining and you don't 
have your umbrella, you are breaking the rule ("true = false" is false). If it's not 
raining, you are obeying the rule whether you have your umbrella or not. The 
rule is not being broken by "no rain and umbrella" (false > true) or by "no rain 
and no umbrella" (false = false). Mathematicians agree to consider "false > 
true" and "false = false" to be true, but they do so grudgingly. Such implications 
are said to be vacuously true.* 


Analyzing compound statements in this way is a very small part of symbolic 
logic. We use symbolic logic to help us understand complex statements in terms 


of their (simpler) component parts. Sometimes it is possible to deduce the truth 
value of a compound statement from its form rather than its content. For 
example, "4 or not A" is true regardless of the content or truth value of the 
statement A, while "4 and not A" is always false (be sure you see why this is so). 


These examples are too simple to illustrate the value of symbolic logic. We 
will examine some that are more significant. For instance, it might be useful to 
have an alternative means of expressing the relationship "4 = B." Is there 
another statement containing the letters A and B that is false only when A is true 
and B is false? The "Umbrella Rule" would be false only if it were raining but 
there were no umbrellas in sight. This would correspond to "4 and not B." 
Perhaps "not (4 and not B)" is the same as "4 = B." We can check this with a 
truth table: 
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The two rightmost columns of this table tell us that the statements "4 = B" and 
"not (A and not B)" have the same truth value for any combination of truth 
values of A and B. We may show "not (A and not B)" instead of "4 => B" if the 
former is more convenient. If two statements X and Y always have the same truth 
value, we say that they are logically equivalent and write X = Y or X = Y. Were 
we to include a column in the previous table for "(4 = B) = not (4 and not B)," 
all its entries would be T. An expression that is always true is called a tautology. 
One that is always false is a contradiction. Symbolic logic, in the rudimentary 
form in which we use it, is a search for tautologies and contradictions. "4 = B" 
is also read "4 if and only if B" or "A is necessary and sufficient for B." You will 
show in Exercise 1.4.2 that "4 = B" is equivalent to "(A = B) and (B = A)," thus 


avoiding a possible conflict in meaning.° 


EXERCISES 1.4 


1. Make up more examples to illustrate the inclusive and exclusive "or." 


2. (a) Show that "4 or not A" is a tautology and "4 and not A" is a contradiction. 
(b) Show that "(4 = B) and (B = A)" is equivalent to "4 = B." 


3. Prove each of the following using truth tables: 
(a) not(A or B) = ((not A) and (not B)) 
(b) not(A and B) = ((not A) or (not B)) [the statements in (a) and 
(b) are called deMorgan's laws. ] 
(c) (A = B) © (not B = not A) [This is called the contrapositive. | 
(d) ((4 or B) = C) = (A= CO) and (B= C)) 
(e) (4 = (Band C)) = (4 = B) and (A = C)) 
(f) not (not A) = A 
(g) Discuss the importance of parentheses in the statements in this exercise. 
In part e), for instance, is "4 = (B and C)" the same as either "(4 = B) and 
C" or "4 => B and C"? 
4. (a) Can "4 = B" and "4 = (not B)" ever both be true? 
(b) Can "4 = B" and "(not A) = B" ever both be true? 


5. How many lines are there in a truth table that involves n statements? 


6. (a) Construct a truth table to show that "4 = (B = C)" is equivalent to "(A 
and B) => C." 


(b) Show that (4 = B) © (not A = not B). 


(c) Discuss why there is a "possible conflict in meaning" in the interpretation 
of Not 


(d) Show that "4 = (B = C)" is equivalent to "(A and B) => C." by negating 
both statements and simplifying the results. 
7. Show that the following are not true. 
(a) (A = B) = (B= A). 
(b) (A = B) and B) = A. 
(c) (A or B) and B) = A. 
(d) ((4 or B) and B) = not A. 


8. In this section we have referred to "not (A and not B)" as a statement. Where 
is the verb in this expression? 


1.5 PROOF BY CONTRADICTION 


The equivalence in the last table in Section 1.4 is the basis for a technique called 
proof by contradiction, which is taken in practice as: 
(A = B) © ((A and not B) is false). 

You will verify this in Exercise 1.5.1. A proof by contradiction usually begins 
with the phrase "Suppose B does not hold ...," or something like it, and ends 
with "This is a contradiction."© Such proofs are also called "indirect." Because 
this technique is so useful, the negation of complex statements is an important 
skill. The most famous proof by contradiction is probably Euclid's proof that 
there are infinitely many prime numbers: 


Suppose there were only finitely many prime numbers. Then we could 
make a list of them all: pj, p>, ..., p,,. Consider the number p = (p, x p> * 


. X p,) + 1. Observe that p is not divisible by any of the listed primes. 


But every number larger than 1 is divisible by some prime. This 
contradicts the assumption that our list contains a// the primes, and the 
proof is finished. 


In this proof, the hypothesis (A) consists of a collection of statements about the 
factoring of natural numbers (which go unspecified here, but which would be 
clear in the context of a course in number theory). The conclusion (B) is "There 
are infinitely many primes." Euclid showed that "Certain statements about the 
factoring of natural numbers" and the statement "There are only finitely many 
primes" cannot both be true, that is, "4 and not B" is false. 


EXERCISES 1.5 
1. Verify proof by contradiction: (4 = B) = ((A and not B) is false). 


2. In Euclid's proof, is the number p = (p, p> x ... X p,) + 1 necessarily prime 
itself? Compute p using the first few primes. Is p always prime? 


1.6 CAUTION! THE SIREN SONG OF CONTRADICTION 


There is temptation to use proof by contradiction far too often. Indirect proofs 
can be very attractive since they tend to be short and sometimes give us results 


almost by magic. But this comes at a cost. Magic is an activity in which what is 
really going on is concealed as much as possible. Our goal in writing a proof is 
to reveal as much as possible. A proof that seems to work by magic doesn't teach 
us much, and we should avoid such things whenever we can. This advice is 
given not only to ensure that we practice a variety of techniques (though that 
would be reason enough). There are schools of mathematics and philosophy in 
which proof by contradiction is not accepted, and such objections should not be 
dismissed lightly. Proof by contradiction rests on the assumption that a statement 
must be either true or false (in fact, this is our definition of statement"). In life, 
there seem to be meaningful assertions that are neither true nor false ("Nice day, 
eh?")’. That there are only two truth values is called the "law of the excluded 
middle," which has been an assumption of Western logic for millennia. Is it true? 
Who knows? It is one of our axioms. 


When should we use proof by contradiction, then? There is no single answer 
to this, only guidelines. A direct proof is usually preferable to an indirect proof 
(Euclid's proof can be reworded in such a way to make it direct, but its role in 
history entitles it to be left alone). The best hint that a proof by contradiction is 
appropriate is that the negation of the conclusion carries more information than 
the conclusion itself. In Euclid's proof, the negation of the conclusion (the 
assumption that the set of primes is finite) allows us to do something very useful: 
Make a list of them. Once we have specific names for the elements of this set, 
we can do arithmetic with them. 


1.7 DISJOINED CONCLUSIONS 


Consider the expression "A = (B or C)." If a rule says "If it rains you must either 
have an umbrella or a raincoat," how could you defend yourself against the 
accusation "It is raining but you have no umbrella"? You could say "But I have 
my raincoat." The rule can be interpreted: "If it rains and you don't have an 
umbrella, you must have a raincoat" (or "If it rains and you don't have a raincoat, 
you must have an umbrella"). It seems that "4 = (B or C)" is equivalent to "(A 
and not B) = C." It also seems that "4 = (B or C)" is equivalent to "(4 and not 
C) => B." 

The truth table below checks the first of these (you will show the second 
equivalence in Exercise 1.7.1). The fifth and last columns of this table are the 
same, and so the two statements are equivalent. Notice that, since "4 = (B or C)" 
is equivalent both to "(4 and not B) > C" and to "(A and not C) = B," we do not 
need to establish both of the latter two statements to prove the first. We can work 


with whichever looks the most promising, a decision we would make based on 
whether "not B" or "not C" gives us the most useful information. 


C |B or C|A = (B or C) |notB| A and not B |(A and notB) = C 


T T P T F F al lg 
T/T T F F T 
T|F T T T T 
T|F F T I F 
F|T T F F ‘i 
FIT T E . = 
F|F T T F T 
FIF T F T 


EXERCISES 1.7 


1. Show that (A = (B or C)) © ((A and not C) = B). 
1.8 PROOF BY CASES—DISJOINED HYPOTHESES 


Consider the expression "(A or B) = C." You showed in Exercise 1.4.3.d that this 
is equivalent to "(4 = C) and (B = C)." This equivalence provides us with a 
very useful technique called proof by cases. Suppose we wish to show that it is 
always true that x < |x|. The definition of the absolute value changes depending 
on whether x is negative or not, and so we may rephrase the problem: "If x is 
either negative or nonnegative, then x < |x|." We've changed the hypothesis from 
"x 18 anumber" to "x < 0 or x => 0." The proof can be done like this: 


Case 1: If x < 0, then |x| = —x, which is positive, and so x < 0 < —x = |x| 
and x < |x}. 

Case 2: If x > 0, then x = |x|, and so x < |x]. 

In either case, x < |x]. 


There are two important rules to keep in mind while constructing a proof by 
cases: First, the cases must exhaust all possibilities. In the above proof, at least 
one case must apply to any value of x we might chose. This proof would not be 
valid had we said "If x < 0 then x < |x|; if x > 0 then x < |x|," since we would not 
have established the result for x = 0. Second, all cases must lead to the 
conclusion of the theorem. It is common, but a bit careless, to say something like 
"If x <0 then x < |x|; if x > 0 then x = |x|, consequently x < |x|." 

Dividing a proof into cases is somewhat of an art. Sometimes the cases can 


be selected in more than one way so it is not clear what the cases should be, and 
sometimes it does not even become clear until late in a proof that the hypothesis 
contains a disjunction. When you have finished a proof by cases, you should 
examine it to see that the above two conditions are met and look to see whether 
part of your proof might be more general than you thought. We might have 
divided this proof into three cases: "If x < 0 then x < 0 < |x|, and so x < |x|; if x > 
0 then x = |x|, and so x < |x|; if x = 0 then x = 0 = |x|, and so x < |x|; consequently 
x < |x| for all x." The third case is resolved in the same way as the second, and so 
it was not necessary to consider it separately. This proof is not wrong, it is just 
not as good as it might be. 


EXERCISES 1.8 


1. (a) Show that ((4 = C) and (B = D)) = ((A or B) > (Cor D)). 
(b) How does (a) reflect on our comments about proof by cases? 


(c) Show that the converse of the implication in (a) does not hold. (You 
should first consider carefully what the converse is!) 


2. If is a natural number with n > 3, show that the expression Hin—3y! is alway a 
natural number. (Hint: n must be either a multiple of 3, one more than a 
multiple of three, or two more than a multiple of 3.) 


1.9 OPEN STATEMENTS AND QUANTIFIERS 


How does an expression like "x7 + 2x + 1 = 4" fit into our discussion? 
Technically speaking, it isn't grammatically correct (the symbol = indicates that 
the numbers on either side of it have the same value, but "x* + 2x + 1" is not a 
number), and so in the strictest sense it is not a statement at all. We understand, 
though, that x is a place holder (or variable). When we insert a number in its 
place, this expression becomes a statement. 

Such expressions are called open statements, and they present special 
difficulties. An open statement might be true regardless of the value of the 
variable, like "x7 + 2x + 1 = (x + 1)*." It might be true for some values of the 
variable but not for others, like "x? + 2x + 1 = 4." Finally, it might be false for 
every value of the variable, as in "x is a real number and x? = —1." 

We can't say whether an open statement is true or false until we specify the 
values of the variable for which our claim is to be made. This is called 


quantification. There are only two quantifiers: the universal quantifier, 
denoted V, and read "for each," "for all," "for every," or "whenever," and the 
existential quantifier, read "for some" or "there exists" and denoted J (we will 
use these two symbols). Distinctions among the various "pronunciations" of the 
symbols are largely stylistic—use the one that sounds best. The phrase "such 
that" (denoted *) usually accompanies the existential quantifier for purely 
grammatical reasons. We write “=" > (* > 3)” and say "there exists x such that x 
> 3." The symbol 2 serves no mathematical purpose, and it is best to consider it 
part of the quantification of the variable rather than part of the open statement. 

A quantified statement consists of a list of variables with their quantifiers, 
followed by an open statement. We will call this standard form. The following 
quantified statements are true: 


iy fs 9 > 

Va(sin* 2 + cos* x = 1) 

ny ee 

dr 3(x* + 27+] = 4) 
wo 4 9 na’ 
4dx23(a is a real number and x* = 3) 


Vadya(y? = 2), 


while these are false: 


Va(sinz + cosz 1) 
\ / 2? ‘ . \ 
Va(a- + 247+1= 3) 
4a>(a2 is a real number and x* = —1) 


Vrdy3(y is a real number and y* = x). 


To make sense of quantified statements, we must assume some universe from 
which the values of the variables are to be chosen. Unless we say otherwise, this 
will always be the set of real numbers. The first statement above really should be 
written Vair €E R= sin? ¢ + cos? 2 = 1) or Ve € R (sin? + cos* x - 1), but we 
usually dispense with explicit mention of the universe unless it is of special 
importance. 


In ordinary language, we often fail to state quantifiers explicitly. For 
instance, if we say, "It rains where I live," we (probably) mean this to be 
existentially quantified: “¢ © {days} > (It rains on day d where I live). On the 
other hand, we probably would intend the statement "People should treat each 
other well" to be universally quantified. While this should not be a problem in 
mathematics, there is still confusion. If we say "Show that a symmetric matrix 
has real eigenvalues," the statement is meant to be universally quantified. An 
example (a calculation involving a specific matrix) does not prove this 


statement. Unstated quantifiers should always be taken to be universal. 


If an open statement has more than one variable, the order in which they 
appear in the standard form is very important. The definition from calculus of 
the statement "The function fis continuous at the point a" is: 


Ve > 0356 > O3V2(|x — al < 6 > | f(x) — f(a)| < €). 


Writing this statement with the first two variables reversed, like this: 
46 > O3Ve > OV2a(\x — al < 6 => |f(x) — fla)! < €), 


changes its meaning dramatically. It is left to the reader to decide what, if 
anything, the latter statement says about the function /. 

In the definition of continuity, 6 may change if ¢ changes. Generally, an 
existentially quantified variable may depend upon the variables appearing before 
it in the list. The definition of continuity would be more clear if we wrote it like 
this: 


Ve > 0358(e) > OSVa(\x — al < &(€) > | f(x) — f(a)| < €). 


If we had been careful to do this, it would have been clear that the two 
statements above are different since 6 may depend on « in the first but not in the 
second. Quantified statements are not often written this way, though. 


The way a statement is quantified will guide us in constructing its proof. The 
basic methods for proving universally and existentially quantified statements are 
very different. It is extremely important to determine which sort of statement we 
are considering before we begin a proof, and we must always take the time to 
pick out any unstated quantifiers. 


EXERCISES 1.9 


1. In which of the examples of quantified statements given in this section is the 
universe important? (When could a change in the universe change the truth 
value?) 


2. Characterize the functions for which 
46 > OSVe > OV2(|x — al < 6 > |f(x) — f(a)| < e). 


3. Convince yourself of the truth values of the examples of quantified 


statements given in this section (we will not have the techniques to prove 
these statements until later). 


4. (a) Consider the effect on a quantified statement of changing the quantifiers. 
What happens in the examples given if each V is changed to a J and vice 
versa? Are any of them still true? Does one that is true necessarily become 
false? 


(b) Consider the effect on a quantified statement of changing the quantifiers 
and negating the open statement. 


1.10 PROVING UNIVERSAL STATEMENTS 


The most direct way to prove a universally quantified statement would be to 
make a separate argument for each value of the variable. For instance, we may 
prove that “* © {0,1} (x* = 2) by checking that the statement is true if x = 0 (since 
0* = 0) and if x = 1 (since 17 = 1). Notice that (i) the two "proofs" have no 
variables in them (an open statement becomes a statement when a value is 
inserted for the variable), and (11) the proofs are different (though not much so in 
this case). This is not practical, if course, if the universe is larger, or if the 
problem is given in such a way that identifying all of the elements in the 
universe might be as difficult as doing the rest of the proof. To overcome this, 
we will construct proofs that consist of open statements (rather than statements), 
being careful that each step is valid for each value of the variable. If we can't 
construct a proof that is valid for all values of the variable at once, we might 
resort to a proof by cases. Since we can't keep in mind every value of the 
quantified variable, we will begin such a proof by giving a name to one element 
of the universe. We then must be sure that our argument is valid for that 
particular element. 


Here is a simple example. Let us prove: "Jfx > 1, then x* > x." Since there is 
no explicit quantifier in this statement, we assume it is intended to be true for all 
x greater than 1. We begin the proof by naming the element we will be 
discussing: 


Let x > 1. (sometimes we say "Let x > 1 be given."’) 
Then it is also true that x > 0. 
So (x)(x) > @)(). 


That is, x? > x. 


We have used the fact that the ordering on the real numbers is transitive (in order 
to say that x > 1 and 1 > 0 imply x > 0), and the fact that multiplying both sides 
of an inequality by a positive number does not change the inequality (both of 
these properties of real numbers will be proved in Theorem 4.12). Though we 
have made some assumptions about the ordering of the real numbers, the only 
property of x itself we have used is that it is greater than 1. 

Here is another "example." I will "prove" that [fx is positive and y < x, then y 
is positive(!). Here both variables are universally quantified, and so the proof 
begins like this: 


Letx>0. 

Let y <x. 

Since x > 0, x2 >x. 

So x(x — 1) =x*-x>0. 

Since x > 0 and x(x — 1) > 0, we must have x — 1 > 0. 

This argument can be repeated to show x -2>0,x-3>0,.... 


No matter what x is, these numbers go to —«, so one of them (say x — k) is 
smaller than y. 


Soy>x-k>0,andy>0. 


This proof is not valid (aside from the fact that the statement is silly), because it 
includes a statement that is not true for all stated values of x (which statement?). 
On the other hand, the proof did start correctly (whatever small comfort that 
might be) and some of the statements are valid inferences from those above them 
(which ones’). 


EXERCISES 1.10 


1. What is wrong with the "proof" at the end of the section? Find the first 
statement in it that is not valid. For which values of which variable (if any) is 
that particular statement true? Is the proof valid if the statement of the 
problem is adjusted to include only those particular values of the variable? 
Decide whether each statement in the "proof" follows from those above it. 


1.11 PROVING EXISTENTIAL STATEMENTS 


This is easier to describe than the previous problem, but not always to carry out. 


The "technique" for proving existentially quantified statements can be summed 
up in one sentence (whose sweeping generality is in distinct contrast to its lack 
of practical advice!): 


The best way to prove that something exists is to find one. 


To prove that 4¢5(2* + 2¢+1= 4) we need only observe that the open 
statement becomes true when we set x = 1. Notice that we do not need to find all 
values of x for which the equation holds to show that the existential statement is 
true. Giving an example is not the only way to prove an existentially quantified 
statement. Sometimes it is possible to show that the assumption that something 
does not exist leads to a contradiction. But we should always look for an 
example first. 


It is important to remember that, although we can (and should) prove 
existentially quantified statements by finding an example, a universally 
quantified statement can't be proved in this way. Observing that the inequality 
holds for x = 3 does not prove that VY" © R(#*+2r+1 > 0), Pictures, such as Venn 
diagrams, are examples (they are sets of points) and so cannot be used to prove 
universal statements. They can give us ideas for proofs, though, and can provide 
examples for existential statements. For instance, here are Venn diagrams that 
suggest that AN(BUC) =(ANB)U(ANC). 


A _AN(BUC) . A 
~ (ANB)U(ANC)~_| \ 
Eseries reer ee eee ATE 
(11 | SST 4111 | | f SNM \ 
NATO ah or) | 


These diagrams might seem to give compelling evidence, but they don't prove 
the result, because they are just one example (we do have some idea, of course, 
that such a Venn diagram is a very general representation of the possible 
relationships among three sets, but sorting this out is not the issue now.). On the 
other hand, the Venn diagram below proves that it is possible to have 
AN BOC = W even though none of ANB, ANC, nor BAC is empty. Notice that the 
quantifier associated with the picture above is universal, while the statement 
being described in the picture below has an existential quantifier ("It is 
possible"). 


1.12 NEGATING A QUANTIFIED STATEMENT 


There is only one technique for negating quantified statements. Even in ordinary 
(careful) language, the opposite of "This always happens" is "There is one 
instance when it does not happen" (but not "This never happens"). On the other 
hand, the opposite of "This happens at least once" is "This never happens." The 
negation of quantified statements is relatively simple if they are written in 
standard form: Change each quantifier to the other one (remember there are only 
two), and negate the open statement. The statement V2 © R(a* + 2x +120) is 
negated: 32 € R3(x*+2x+1<0), This works for statements with any number of 
variables. The negation of J¢3Vy(2* =y) is Vedy3(2* #y), (Which of these is 
true?) The statement "The function fis continuous everywhere" would be written 
(note that "everywhere" is a universal quantifier): 


Vave > 056 > 03V2(\x — al < 6 => | f(x) — fla)| < ©). 
The negation of this is then: 
dase > 03V6 > OSr3(\|r — al < 6 and | f(z) — f(a)| > €). 


[Recall that (not (A = B)) = (A and not B).] Changing the quantifiers can also 
change which variables depend on which others. In the first statement above, 6 
depends on « and a. In the second statement, it doesn't. 


EXERCISES 1.12 
1. Which of the statements 22> y(«* = y) or Vesy3(«* ¥ y) is true? 


2. Translate each of the following into a quantified statement in standard form, 
write its symbolic negation, and then state its negation in words: 


(a) Everybody loves somebody sometime. 


(b) You can fool all of the people some of the time. 

(c) You can't teach an old dog new tricks. 

(d) When it rains, it pours. 

(e) Things have never been more like they are right now. 

(f) If we don't hang together, surely we shall all hang separately. 
(g) Out of sight, out of mind. 


3. (a) Define (in words) the phrase "The function f: A — B is one-to-one." (If 
you've never encountered this phrase before, skip this problem.) 


(b) Carefully state (in standard form) the definition of the phrase "The 
function f: A — B is one-to-one." 


(c) Carefully state (in standard form) the definition of the phrase "The 
function f: A — B is not one-to-one." 


(d) Define (in words) the phrase "The function f: A — B 1s not one-to-one." 


(e) Repeat (a) through (d) with "one-to-one" replaced by "onto." 
1.13 THE FORWARD-BACK WARD METHOD 


We have seen that in certain circumstances we can tell, just by looking at the 
type of problem we are trying to solve, where a proof should begin and where it 
should end. This idea is the beginning of a powerful technique called the 
forward-backward method (the name seems to have been coined by Daniel 
Solow in his book How to Read and Do Proofs—see the references). The 
forward-backward method is the mathematical equivalent of building a chain 
from both ends at once. As builders of proofs rather than chains, we enjoy some 
advantages over blacksmiths. The collections of steps and reasons from which 
we may choose are fairly small, and each step can be linked to only a few others, 
often in only one way. Thinking in this way, the forward-backward method can 
be boiled down to some simple advice: 


(1) Every statement begins somewhere and ends somewhere: An equality 
begins on one side and ends on the other; an implication begins with the 
hypotheses and ends with the conclusion; and so on. 


(2) We want proofs to get us from one place to another. Think of the 
things you know that begin where you are (statements whose hypotheses 


match the hypotheses of the problem). Now think of the things you know 
that end where you want to go (statements whose conclusions match the 
conclusions of the problem). 


(3) Put the things you thought of in (2) in place in the chain. The ends of 
the chain should be closer together in some sense than they were before. 
Essentially, your hypotheses and conclusions have changed. Now go back 
to step (2) and start again. 


Let's do a simple proof: Jf A and B are sets and A © B, then C(B) © C(A). [C{A) 
is the complement of A.] Our goal is to show that one set is a subset of another 
\C(B) © CA), s9 we must show that every element of C(B) is also an element of 
C(A). This indicates a universal quantifier, and so we already know what the first 
and last steps of our proof must be. (Proofs involving sets will be discussed in 
more detail in Section 1.15.) We will indicate a proof by the forward-backward 
method with the following structure. 


Then rEC(A ). 


We know the definition of the complement of a set, and since we know nothing 
else about A and B, it seems that this definition will have to come into play. We 
can insert another step at the top and one at the bottom (we will indicate new 
lines with the symbol —-): 


~~» Then « ¢ B, 
rEA 
T en @ € C( A ) 


We have taken advantage of the fact that the definition of the complement of B 
begins at “« © C(B)” and ends at “” ¢ 8.” [Since it is a definition, it also begins at 
“vx € A” and ends at “# © C(A)."|."] We haven't written "Then « ¢ A" at the bottom, 
since we aren't yet sure that it will follow from things above it (we hope it will, 
but this will not be a proof until all the pieces are in place). This is, of course, 


neither a very long nor a very difficult proof. We should observe at this stage 
only that the pieces of the chain at the beginning and at the end are valid bits of 
reasoning. Our problem now has changed. Now we don't have to worry about 
whether the first statement in our proof implies the last, but only whether the last 
statement at the top ‘* ¢ ?) implies the first statement at the bottom (* ¢ 4), We 
do know that 4 © BP. If x were an element of 4, then it would have to be an 
element of B, and this is not the case. This is a small proof by contradiction, 


which we now insert, and we are done: 


Then = ¢ 8. 
If x were and element of A, it would have to be an element of B, and this is 
not the case. 


We can now view the whole proof, inserting the reasons for each step on the 
right (this will be left to you for most of the book). 


Let 2 € C(B), This is where the proof 
must start. 

Then = ¢ 2, Definition of the 
complement. 

If x were an element of A, it would have to be an Follows from the line 

element of B, and this is not the case. above, since A & B, 

cg A. From the two lines 
above. 

Then # € C(A), This is where the proof 
must end. 


We don't always have to alternate between backward and forward steps. (In 
practice, most proofs are constructed primarily of backward steps.) Notice that 
the last step we inserted was not simply a definition or theorem. We got it by 
considering where we stood in the proof and the information available to us. The 


forward-backward method usually is not sufficient to do an entire proof. 
1.14 A BIGGER AND BETTER EXAMPLE 


We have just made quite a big deal of a very simple proof. Let us try the method 
on something substantial. We will prove a result from calculus: 


[If the function f is differentiable at a point a, then it is continuous there. 


This is a significant result, and the proof is not at all easy. Adding to the 
difficulty, we will discover that it is an "existence" proof, so we will need to find 
an example of some sort. Existence steps are the most difficult part of any proof, 
and we will come to find, sadly, that they are where the forward-backward 
method is least helpful. However, the forward-backward method will help make 
it clear where existence steps occur in a proof, allowing us to focus our attention 
where it is most needed. 

You may have seen this before, though probably not in this much detail. 
Don't be concerned with every step. What is important now is the way the proof 
is constructed. We know how it begins and ends: 


fi iS differentiable at the noitea a. 


* * * 


fi is continuous at the pom a. 


fi iS differentiable at the pointe a. 
f(x) — f(a) 
ra g£-—a_ exists (we may call it L). 


* * * 


— lim f(z): =. fla). 


es is continuous at the pee a. 


Our chain now has two links at each end. The statement we have inserted at the 
bottom—lim,._,,, (x) =f(a)—s defined: 


Xa 


Ve > 056 > O3Vxr(0 < |x —al < 6 > |f(z) — f(a)| < e). 


This is universally quantified (whatever else goes on in the proof, we must show 
that something happens "Ve > 0"). This tells us what to do next: 


fis iS differentiable at ‘thie noitia a. 

i f(z) ) — f(a) 

ra @¢—a exists (we may call it L). 
—» Let e > 0 be given. 


* * * 


lim f(z) = f(a). 


fi is continuous at the point a. 


ae Ne ae ae al a ee ee ee at ee ee ee ee a ee a ee ee eee ae ee Oe a ee ae ee ae ee ee ee ee ee” a ee ee 


fis 1S differentiable’ at the Sone a. 
lim f(x) — f(a) 
—_eT_™= . . 
ra  ¢—a _ exists (we may call it Z). 
Let e > 0 be given. 


* * * 


— 346 > 05Vz(0 < |z — al < 6 > |f(z) — f(a)| < €) 
lim T(z} = fle). 


fi is continuous at ‘the pom a. 


ay Var ‘ar wer Gy ae en ey te ee ee ae ae ae or ee eee ee ee ae ee ee we oe ee ee oe ee eee eee ee ee ee, ee 


We have found that our goal in the proof has become to show that this 6 exists, 
meaning that we will have to give an example of such a 0. Let us turn our 
attention back to the forward part of the proof. The definition of the limit there 
tells us something about this ¢: 


fis iS differentiable at the noite a. 
lim £62) — Fla) 
ra ¢—a_ exists (we may call it L). 


Let e > 0 be given. 
f(x) — f(a) 


> 6, > 03Vr (0 <|r-—al <6, 5 L < .). 
I-a ; 


* * * 


46 > O5Vz(0 < |x —a| < 6 > |f (xr) — f(a)| < €). 
Tim f(z) = f(a). 


fis is continuous at the ponte a. 


Here we have used a little foresight. We don't know whether the 6 that the 
definition of the limit gives us here is the one we're looking for, and so we have 
given this one a different name. 


Now we have reached the point where we will need to do some creative 
work. The problem gets a little harder, and you can consider the lesson of the 
forward-backward method to be over (the next few paragraphs can be safely 
skipped at this time). The line we've just inserted looks a little like the one we 
want to reach (at least it has some of the same pieces). Perhaps we can 
manipulate them to make them look more alike. Look at the step we can reach 
from the beginning: 


a We (x) — f( 
46, > 03Vx (0 <|x-—al <6, > f(z) ~. 2) L\ < :) 


rI-@a 


and where we need to go: 
46 > O35V2(0 < |x —al < 6 > |f(x) — f(a)| < ). 


We need to make the first of these look more like the second. The expression f(x) 
— f(a) appears in both, but it is sitting all by itself in the second line. Let's put it 
by itself in the first. With a little algebra, we can change the conclusion in the 
first line to: |f(x) — f(a) — L(x — a)| < e|x — a|. We will insert this into our proof in 
a moment, but it still is not quite what we want, since |f(x) — f(a)| is not yet by 
itself. Some free association is needed. Is there a way to relate an expression like 
\X¥ — Y| to |X| and |Y|? There is. It's a version of the Triangle inequality: |X| — |Y| < 
|X — Y|. Letting f(x) — f(a) be X and L(x — a) be Y, and doing a little algebra, we 
find |f(x) — fla)| < elx — al + |L||x — a]. With bit more algebra, we can insert: 


fi iS differentiable at ed Bolted a. 

i f(x) — f(a) 

ra ga exists (we may call it ZL). 
Let e > 0 be given. 


ES ee i (x) — f(a) | 
36; > O35Vzr (0 < |r -—al <6, > IK u L| < e). 


cf 


oe 46, > 03Vzr(0 < |x —al < f(x) — f(a) eg (¢ + |L|)\|a — al). 


46 > O3Vzr(0 < |x —a| < 6 > | f(z) — f(a)| < e) 

lim f(z) = f(a). 

fi is continuous at the point a. 

Now we are getting very close. We have picked 6, in such a way as to make |f(x) 


— fia)| less than something, but we need it to be less than e. The right side of the 
inequality marked by ¢* will be less than ¢ if we can make |x — a| less than é/(e + 
\L|). Let us set 6, = e/(e + |L|). Then if |x — al < 05, the right side of the 
inequality will be less than ¢«. But we must be cautious. In order for the « 
inequality to hold at all, we must have |x — a| < 6,. We need both of these things 


to happen at once, and we can do that with a careful choice of 6: 


fis iS differentiable at the Boia a. 
lim f(x) = F(a) en) 
ra £—a _ exists (we may call it L). 
Let ¢ > 0 be given. 
46, > 03Vz (0 <|r-al<4,> f(z) = Fla) _ y < :). 
| r—a 
dé; > 035Vz(0 < |x — al < 6; = |f(x) — f(a)| < (€ + |L])|x — al). 
—» Let 6, =e/(e + L). 


(e+ |L|)|z — al] < (+L) (2 : ) =e 
— If 0<|x — al < 65, then e+ |L| 
~» Let 6 be the smaller of 6, and 6). 
— » If 0 < |x — al <0, then 0 < |x — a| <6, and 0< |x — a| <5. 
— If 0 < |x — al <6, then |f(x) — fta)| < (e + |L))|x — al <e. 


6 > OSVz(0 < |x —a| < 6 > | f(z) — f(a)| < €). 
jim fiz} = fla). 
fi is continuous at the point a. 


PM ee ae a eet le eee ae a ee ee ee ae ee ee ee at ee Oe ae ee ae ee ae a eee ee ee ee ea eee 


This difficult proof is finished, and since the second, fourth, and fifth of the steps 
we have just inserted serve only to explain the first and the third, note that only 


two of the steps required any creative activity! The great value of the forward- 
backward method is that it allows us to deal with those parts of a proof that are 
merely structural in a routine way, leaving us to focus our attention on those 
issues that really require work. We will examine proofs in this way throughout 
the book. Though it will not always finish a problem for us, we will find that the 
forward-backward method is almost always useful in locating the real issues in a 
proof. 


EXERCISES 1.14 


1. Give a much briefer explanation (not a proof) of why the theorem in this 
section is true. (The denominator of a certain fraction goes to 0. What does 
the numerator do?) 


1.15 SETS 


We need not dwell on the technical aspects of set theory here. We are already 
reasonably good at manipulating sets. We will consider now only how the ideas 
we've discussed in this chapter relate to proofs involving sets. We are generally 
concerned with only two possible relationships between sets—containment and 
equality—and four operations—union, intersection, complement, and relative 
complement (or difference). 


CONTAINMENT: The statement "4 is a subset of B" is defined: 

Va(e ¢ A= xe B). This is universally quantified, and so we know that the proof 
of a theorem in which we must show that 4 © 8 should begin with the phrase 
"Let x € A." This usually should be followed immediately by a statement of the 
definition of A.(’) The rest of the proof must consist entirely of statements that 
are true for any element of A, and it must end with "Then « € B," which would 
usually be immediately preceded by the definition of B. There is a great 
temptation to look for shortcuts in such proofs, but the number of set 
containment proofs that can be done properly any other way is so small that it's 
not usually worth considering other approaches. 


EQUALITY: Two sets are equal if each is a subset of the other. A proof that A = 
B has two parts, one of which should begin "Let « € A" and end "Then z € B," 
the other beginning "Let « « B" and ending "Then x € A." The number of set 
equality proofs that can be done properly any other way is so small that it's not 
usually worth considering other approaches. 


SET OPERATIONS: The definitions of the various set operations are: 


xz € AUB if ((z € A) or (x € B)). 
>€ ANB if ((z € A) and (z € B)). 


I 


x € C(A) if (not(z € A)). 
x € A\B if ((z € A) and not (x € B)). 
These involve only the usual connectives, each of which we have considered. 


For instance, to prove that « € AU B—a disjoined conclusion—we should begin 
"Suppose © ¢ A" and conclude "Then x € B" or vice versa. 


EXERCISES 1.15 


1. Suppose 4, B, and C are sets. Do the following for each of these statements. 
First, construct a Venn diagram that indicates that the result is true, then 
prove the result, using the forward-backward method whenever you can. 


(a) ACB#C(B)CC(A). 

(b) C(AU B) = C(A)NC(B). 

(c) C(AN B) = C(A) UC(B). 

(d) AS (BNC) @ (AC B) and (ACC). 
(e) (AUB)CC#(ACC) and (BCC). 
(f) AC (BUC) #(A\C) CB. 

(g) If AC B, then B\(B\A) = A. 


(h) Is the result in (g) true if 4 & 4? 


2. Use Venn diagrams to show that it is not true that 
(a) (AC B)=>(BCA). 
(b) ((A C B) and (z € B)) > (x € A). 
(c) ((2 € AU B) and (a2 € B)) => (a € A). 
(d) (({2 € AU B) and (x € B)) => (a ¢ A). 
3. Compare Exercises 1.4.3 and 1.7.1 with Exercise 1.15.1, and compare 


Exercise 1.4.7 with Exercise 1.15.2. Consider whether there is a relationship 
between symbolic logic and elementary set theory. 


4. Let S= {x:x=5n, n=1,2,...} and T= {x :x=10n, n=1, 2, ...}. Show in 
detail that T © S, 


5. If S and T are sets, show that S\T = “if and only if S CT. 


6. The set-theoretic equivalent of exclusive disjunction is called the symmetric 
difference of two sets and is given by 942 = (9\P)U(T\5), 


(a) Make a Venn diagram illustrating SAT. 
(b) Show that SAT = (SUT)\(SNT), 


(c) Explain why this is like exclusive disjunction. 


7. Unions and intersections involving infinite collections of sets are defined as 


follows. Let {Sa : « © A} be a collection of sets (A is an index set that can be 
of any size). Then 


Uses Sa = {2 : Sa € AB(z E S,)} 
and flaca Sa = (x: Va € A(x € Sq)}. 
If the index set is the set of natural numbers, we write 
Un=15n or (nai Sn 
If the index set is known from the context of the problem, we can write 


ig Sn ’ OM So ’ rE Sn » OF Fil Sa ’ 


which mean "the union or intersection over all possible values of n or a." 
When we use a Greek letter for a subscript, we are making no statement 
about the size of the index set, while using a roman letter indicates a 
countable index set (the size of a set and what it means for a set to be 
countable are discussed in the next chapter). 


(a) State the definitions of union and intersection in words. 
(b) Show that the distributive laws hold in the following forms: 


Unea(T'M Sa) = TN (Uses Sa) 
and = (\yea(TU Sa) = TU (Nes Sa): 


(c) Show that the union of a collection of sets is the smallest set that contains 


all the sets in the collection. !9 


(d) Show that the intersection of a collection of sets is the largest set that is 
contained in all the sets in the collection. 


8. State and prove analogues of DeMorgan's laws for infinite collection of sets. 


9. If a collection of sets is indexed with the natural numbers, we define the 
limit superior and limit inferior of the collection, respectively, as follows: 


lim sup{S,,} = 1-—, (Uns Sn) 
and  liminf{S,} = Uy—, (Np—% Sn): 


& 


(a) Show that limsup{Sn} = {@: 
(b) Show that liminf{S,} = {2:2 € Sy for all but finitely many n}. 

(c) Show that liminf{S,,} C limsup{S;}, and that they might not be equal. 
(d) If 1 © S2 €..., show that !imsup{ Sn} = ia 

(e) If 51 2 S22... show that Himinf{Sn} = nai Sn, 


(f) Give examples of collections where the identities in (d) and (e) fail. 


€ Sn for infinitely many 7}. 


mM 


10. (a) Show that a set can be an element of itself. (In some constructions of set 
theory, there is an axiom that prevents this. We will see why shortly.) 


(b) Show that a set might not be an element of itself. 


(c) Consider the collection of sets ® = {5 : 5 € 5S}, Show that both the 
assumptions R ¢ R and & € # lead to contradictions. If R were a set, one or 
the other of these statements would have to be true. We must conclude that R 
is not a set. (This is called "Russell's Paradox," after Bertrand Russell. 
Russell's discovery of this problem forced a radical change in the way 
mathematicians of the time viewed set theory.) 


1.16 AGLOSSARY 


Here are a few terms that will be used throughout the book. We have already 
said what axioms, statements, and proofs are. 


A theorem consists of one or more statements (the hypotheses) that we intend to 
prove imply one or more other statements (the conclusions). We should view a 


theorem as a dynamic process that is not over until the proof is complete. Once 
we have proved a theorem, it becomes a (true) statement. 


The structure of a theorem should help us decide which statements are 
hypotheses and which are conclusions. If this is unclear, we should rewrite the 
theorem (being careful not to change its meaning) in the form: 


IF {the hypotheses} THEN {the conclusions}. 


Owing to laziness and tradition, theorems are not always stated this way, but we 
should always be prepared to convert one to this form if necessary. We will 
indicate the end of a proof with the symbol =. Sometimes we put this symbol 
immediately after the statement of a theorem, where it means either "We have 
already proved this" or "You will prove this yourself." Jt is traditional in 
mathematics texts to print the statements of theorems in type that looks like this. 
You have already seen this done from time to time in this chapter. 


The converse of the theorem A = B is the theorem B = A. The proof of the 
converse of a theorem is a separate issue from the proof of the original theorem. 
It is a common error to mistake the truth of a theorem for the truth of its 
converse. "I see what I eat" (If I eat it, then I see it) is different from "I eat what I 
see" (If I see it, then I eat it). 


An if and only if theorem is a theorem in conjunction with its converse. The 
two parts must be proved separately. 


A definition has the form of an if and only if theorem, where one of the parts 
(the one being defined) had no previous meaning. A definition is taken to be 
true, and can't be proved. In this way, definitions play much the same role as 
axioms. As we have done already, we will use boldface to indicate the term 
being defined, for instance: "A triangle is equilateral if and only if all its sides 
have the same length." Tradition again leads to bad habits, and definitions are 
usually written with "if" where they should have "if and only if." A definition 
always includes (if only implicitly) the connective "if and only if." 


A lemma is a special sort of theorem that represents a portion of the proof of 
another theorem, pulled aside to be proved separately. Lemmas are pieces of our 
chains from the middle, which may be assembled (that is, proved) independently 
of the rest of the chain. Arranging a proof into lemmas is a bit of an art. It is 
usually done to improve the organization of a proof, but sometimes because a 


lemma is interesting in itself. In practice, it is almost always done after a proof is 
complete (when we notice that some part of a proof can stand on its own). 
Calling something a lemma indicates that it will be used primarily as part of 
another proof. 


A corollary is also a theorem, usually stated just after another has been proved, 
whose proof is based mostly on the just-completed theorem. If we can prove that 
n <2" for all n, the statement "n — 1 < 2” for all n" is a corollary. (The proof 
would start: "Note that n — 1 <n and refer to the previous theorem.") Sometimes 
a corollary makes reference to some part of a preceding proof, rather than to the 
whole theorem, or to more than one theorem. In any event, calling something a 
corollary indicates that its proof will consist primarily of a reference to another 
theorem. 


We will think of functions in a very simplistic way. A function f from the set A 
to the set B will be denoted f: A — B, where A is called the domain of f, and B 
is called the range (or sometimes the codomain!!). We view the function itself 
as a rule by which we can, for each element of A, produce an element of B. We 
speak of the elements of A as "inputs," and the result of applying the rule fto an 
input x as an "output," which we denote f(x). 


1.17 PLAIN AS THE NOSE ON YOUR FACE? 


A final note on the careful use of language. Words like "it is obvious" or 
"clearly" have no place in correct proofs. They are used all the time anyway, and 
they are used in this book. What they mean in this book is "I leave it to you to 
supply the details and am very sure that you can (but you should do so before 
moving on)." Stories abound of lengthy and intense discussions in class aimed at 
deciding whether some statement or another is indeed "obvious" (some of these 
stories are even true). If you see the humor in this, you're well on your way to 
becoming a mathematician. Here is a simple guideline for now: If you have to 
think about something at all—if you are even momentarily unsure that it is true 
—then it is not obvious. That 1 + 1 = 2 is obvious. That x = 6 is a solution to 2x? 
+ 5x — 102 = 0 is not obvious. 


EXERCISES 1.17 


1. It has been suggested that one should avoid having a mathematician on a 


jury, because they have difficulty with the concept of "reasonable doubt." 
Discuss. 


| This is the price mathematicians pay for the power to make proofs. In mathematics we find things 
that we are compelled to believe (some of which we might prefer not to). Practitioners of other fields have 
more flexibility to pick and choose what they will accept. 


2 Whether the last of these is a statement is actually open to debate. Most mathematicians would agree 
that it certainly is either true or false (all the while admitting they don't know which), but to a group known 
as "intuitionists" it is neither, despite its unambiguous form. See footnote 7 in this chapter. 


3 There are several other ways of saying this: "if A then B"; "B follows from A"; "B if A"; "4 only if B"; 
"A is sufficient for B"; "B is necessary for A." Note the individual meanings of the words "if" and "only if" 
and of "necessary" and "sufficient." 


+ Russell once gave a speech in which he asserted "From false premises I can reach any conclusion." A 
voice in the audience said: "Assume that 1 = 2 and prove that you are pope." (Russell's opinions on religion 
made this a good joke.) Now the statement "If 1 = 2 then B. R. is pope" is true without further explanation, 
but Russell rose to the challenge: "The pope and I are two, therefore we are one." 


5 Exercises and Examples will be referred to by numbers indicating the section in which they appear. 
Exercise 1.4.2, for example, is the second exercise in Section 1.4; Example 18.2.1 is the first example in 
Section 18.2. 


© Tt is sometimes difficult to tell the difference between a proof by contrapositive (Exercise 1.4.3.c) and 
a proof by contradiction. A proof that ends specifically with the statement "not A" is a proof by 
contrapositive, while the contradiction in a proof by contradiction can arise from a number of sources. If 
you start out to do a proof by contradiction ("assume A and not B") and end with the contradiction "A is 
both true and false," your proof probably can be constructed as a contrapositive. 


7 One shouldn't conclude from this example that the issue is insignificant. Maybe statements should not 
be classified "true" and "false" or even as "true," "false," and "matters of opinion." It might be that "true," 
"false," and "it is impossible to tell" are the correct categories. This is a serious philosophical issue. 


8 it might seem that "Let x is greater than 1" doesn't make much sense. This should be read "Let x be 
greater than 1." The second version should be read: "Let x—which is greater than 1—be given." The phrase 
"be given" reminds us that we do not get to pick the value of x ourselves. We know only, in this case, that it 
is greater than 1. 


? This helps remind us that the proof can rely only on the assumption that « © A. Note well that in 
saying "Let 2 © A" at this point in a proof we are not asserting that A actually has any elements. If such a 
proof is valid, it will be so if A = 


10 One must be careful using the words "smallest" and "largest" in this context. The smallest set with a 
given property is that set (if any) that (i) has the property and (ii) is contained in any set having the property. 
The largest set with a given property is that set (if any) that (i) has the property, and (11) contains any set 
having the property. 


'l There is an important distinction between the two terms, but it won't become an issue for us until 
Chapter 8. 


Chapter 2 


Finite, Infinite, and Even Bigger 


2.1 CARDINALITIES 


When we count a set, we try to match its elements with the elements of some 
initial segment of the natural numbers: {1, 2, ..., 7}. To be sure we've counted 
correctly, we should check that (i) each element of the set is associated with an 
element of the initial segment; (11) each element of the initial segment is 
associated an element of the set; (i11) no element of the set is associated with 
more than one element of the initial segment; and (iv) no element of the initial 
segment is associated with more than one element of the set. Having done all 
this, we say that n is the number of elements in the set. Here we make these 
ideas precise. 


DEFINITION 2.1: (a) A function c : S — T is a one-to-one correspondence if 
it has the following properties: 

(i) Vee S¥ye Six #y => clr) Fely)) [that is, c is one-to-one], and (ii) 
vz € [Sr € S3(c(x) = 2) [that is, c is onto]. 
(b) If a one-to-one correspondence exists between two sets, we say that the sets 
are (or can be put) in one-to-one correspondence. 
(c) A set is finite (and contains n elements) if it can be put in one-to-one 
correspondence with an initial segment {1, 2, ... m}. 
(d) A set that is not finite is infinite. 


You will show in Exercise 2.1.1 that this definition is not as directional as it 
seems, and that two sets that can each be put in one-to-one correspondence with 
a third set can be put in one-to-one correspondence with each other. This paves 
the way for the next two definitions. 


DEFINITION 2.2: (a) Two sets have the same cardinality if they can be put in 
one-to-one correspondence with each other. 
(b) If such sets are finite, we say they have the same number of elements. 


The simple idea of associating sets this way in order to count their elements has 


surprising consequences, as we shall see. Sets that seem to be of very different 
sizes can have the same cardinality. For instance, the set of even natural numbers 
can be put in one-to-one correspondence with all the natural numbers, even 
though the set of all natural numbers seems to be bigger: 


2 ‘ o 
Leet £ Fea 
2 4 6 8 10 12 14 
When we learned as children about the number 3, for instance, we were shown 
example after example of sets with three elements: three apples, three cats, two 
carrots and a bunny (things were simpler then). We came to understand that 
these sets had something in common that had nothing to do with their specific 


elements. That common property is what we learned to call "3." This is a 
difficult idea to make precise, but the following definitions tell part of the story. 


DEFINITION 2.3: (a) A cardinality is the property common to a collection of 
sets that can be put in one-to-one correspondence with each other, but that is not 
shared by any set that can't be put in one-to-one correspondence with a set in the 
collection. 


(b) If the sets in such a collection are finite, the cardinality is also said to be 
finite. A finite cardinality is called a natural number.! 


Notice that we don't refer to "the set of a// sets that can be put in one-to-one 
correspondence with each other," since such a collection is so large that it falls 
outside the realm of set theory (see Exercise 1.15.10). 


EXERCISES 2.1 


1. (a) In what way is Definition 2.1.a "directional"? 


(b) Show that (1) any set can be put in one-to-one correspondence with itself; 
(11) if A can be put in one-to-one correspondence with B, then B can be put in 
one-to-one correspondence with A; and (iii) if A and B can both be put in 
one-to-one correspondence with C, then A can be put in one-to-one 
correspondence with B. 


(c) Explain how (ii) serves to eliminate the directional quality of Definition 
2.1.a. 


2. (a) By considering the function f(x) = arctan x, show that the set of real 


numbers has the same cardinality as the interval (—z/2, 2/2). ["Interval" 
hasn't been precisely defined yet—nor has the set of real numbers—but this 


should not be a problem here. | 
H 4 
f(z) = —_ 
(b) By considering the function \+\©\ show that the set of real 
numbers has the same cardinality as the interval (—1, 1). 


(c) Show that any two nonempty open intervals have the same cardinality. 


(d) Show that the set of real numbers has the same cardinality as any 
nonempty open interval. 

(e) Show that the closed interval [0, 1] has the same cardinality as the open 
interval (0, 1). (You might not be able to do this by defining a function. You 
may want to delay this until you have considered Exercise 2.3.7.) 

(f) Show that the set of real numbers has the same cardinality as any 
nonempty closed interval that does not consist of a single point. (You can do 
this whether you have solved (e) or not.) 


. (a) Let J be the set of decimals of the form 0.d,d, .... Construct a one-to-one 
function from / to J x J. 


(b) Find either an onto function from / to J x J or a one-to-one function for J 
x Ito. 


(c) Do Jand J x [have the same cardinality? 


. (a) Let f: A — B be one-to-one and onto. Define g : B — A by saying a = 
2(b) = b = f(a). Show that g is a function and that it is one-to-one and onto. 


(b) Show that, if a € A, then g(f(a)) = a, and if 6 € B, then f(g(b)) = b. 
Explain why g is called the inverse function of f (inverse functions are 
discussed further in Chapter 8). 


(c) Examine your proof of (a) and carefully pick out which properties of / 
lead to which properties of g. (For instance, we know that fis one-to-one. 
Does this fact a/one tell us anything about g?) 


(d) If A is finite and f: A — B is one-to-one, show that the number of 
elements of (A) = {y © B: 4r © AB5(y = f(*))} is the same as the number of 
elements of A. 


(e) Show that it is impossible to have a one-to-one correspondence between a 
finite set and one of its proper subsets. (Hint: Suppose that it is possible to 


have C © B,C ¥ B, and a one-to-one function f: B — C. Consider the set B 
having the smallest number of elements for which this happens.) 


5. Here we generalize the results of Exercise 2.1.1. A relation on a set_X is a set 
of ordered pairs of elements of X. A relation is often denoted by a symbol 
like =, and we write "x ~ y" (and say "x is related to y") to indicate that (x, y) 
is an element of the relation. The relation ~ is called an equivalence relation 
on_X if it has the properties: 


G)x=x forall’ eX 
(11) ifx = y then y = x 
and (111) ifx = y and y ~ z then x ~ z. 


(a) Show that = and < are equivalence relations on the real numbers, but that 
<is not. (For instance, show that the relation defined by saying x =~ y if x = y 
is an equivalence relation.) 


(b) Is S an equivalence relation on sets? 
(c) Is ... is related to ... an equivalence relation on the set of people? 
(d) Is ... is acquainted with ... an equivalence relation on the set of people? 
(e) Let _X be a set and = an equivalence relation on_X. For any element a of X, 
let Xa = {x € X : © =a}, Show that: 

(i) X, #" for all a, 

(ii) if Xa OX» # Y, then X, =X, 

and (iii) * = Us Xo- 

The set _X,, is called the equivalence class of a. A collection of subsets {X,} 
of a set _X having properties (1), (i1), and (111) is called a partition of X. 
(f) if X is a given set and {U,} is a partition of X, show that there is an 
equivalence relation ~ on X so that {U,} is the collection of equivalence 
classes of ~. 


(g) Show that the conditions defining an equivalence relation are independent 
by giving three examples of relations having two of the properties but not the 
third. 


2.2 INFINITE SETS 


Though we have a definition of what it means for a set to be infinite, we don't 


know whether there are any infinite sets! 
THEOREM 2.4: The set of all natural numbers, N, is infinite. 


PROOF: We show that N can't be put in one-to-one correspondence with any 
initial segment of N. Suppose / = {1, 2, ..., 7} is an initial segment and that c : J 
— N is a function (it doesn't matter now whether c is one-to-one or not). The 
collection of outputs of c is finite since it is in one-to-one correspondence with J. 
Let s =c(1) + c(2) +... + c(n) + 1. Then s is a natural number that is larger than 
any of the outputs of c. Thus s € N is not an output of c, and c is not onto. No 
function from any initial segment to N can be onto, and consequently N is not 
finite. = 


In the context of cardinality, the words "larger" and "smaller" take on new 
meanings. Ordinarily, we would say that the set of natural numbers is larger than 
the set of even natural numbers because the former contains all the elements of 
the latter and more. But in the sense of cardinality, these two sets are the same 
size. The footnote to Exercise 2.1.3 suggests a way of sorting this out. We will 
say that set A has smaller cardinality than set B if no function from A to B is 
onto, and that 4 has larger cardinality than B if no function from A to B is one- 


to-one.° 


Here is a more striking example. The set of rational numbers, Q, seems to be 
very much larger than the set of natural numbers. We will show that these two 
sets have the same cardinality! We begin by showing that the set of positive 
rational numbers has the same cardinality as N. 


Here are the positive rational numbers: 
1/1 1/2 1/3 1/4 
2/1 2/2 2/3 2/4 


3/1 3/2 


We will match the natural numbers with the entries in this table in a way that is 
one-to-one and onto. Starting at the upper left, associate 1 with 1/1. Move down, 
to 2/1. Check whether this rational number has already been assigned some 
natural number. It hasn't, so we assign to it the next natural number, 2. If it has 
already been assigned a natural number, just skip over it. Move up and to the 


right, to 1/2, and do the same thing. Proceed through the table in the pattern 
shown in the next diagram. This describes the desired one-to-one 
correspondence (we don't always need a formula to have a function) and shows 
that the set of positive rational numbers has the same cardinality as the set of 
natural numbers. You will finish the proof in Exercise 2.2.1. (Note that 2/2 has 
already been assigned a natural number when we get to it.) 


Mifi @Oif2 O1/3 1/4 


@)2/1 qi = 2/3 2/4 


We are generally interested in sets with only three types of cardinalities: Natural 
numbers (the cardinalities of finite sets), the cardinality of N (which is denoted & 
o—aleph zero or aleph naught*), and those that are bigger. A set with cardinality 
No 1s said to be denumerable. A set that is finite or denumerable is said to be 
countable. If a set is not countable, it is uncountable. 

Upon careful examination, we see that the proof of the countability of the 
rational numbers establishes more than we claimed for it. It really shows that the 
union of any denumerable collection of denumerable sets is denumerable. There 
are several ways to state a result like this, depending on precisely how many sets 
there are and how many elements each of them has. They can be summed up in 
the following very useful theorem, whose proof is Exercise 2.2.2. 


THEOREM 2.5: The union of a countable collection of countable sets is 
countable. w= 


EXERCISES 2.2 


1. (a) Complete the proof that Q has cardinality Xp. 


(b) While proving that the positive rational numbers are denumerable we 
might observe that, if we knew that the Cantor-Bernstein-Dedekind theorem 
was true (see Exercise 2.1.3), we wouldn't have to fuss over whether or not 
we had already counted each of the rational numbers when we get to it in the 
array. Explain. 


2. (a) State the results that are included in Theorem 2.5 (there are four that are 
easy to state, and others that are more complicated). 


(b) Prove Theorem 2.5. 


3. Show that a set is countable if and only if it can be put in one-to-one 
correspondence with a subset of N. 


4. (a) If d, is a digit (one of the symbols 0, 1, ..., 9). how many decimals are 
there of the form 0.d,000 ...? (Say exactly.) 


(b) How many decimals are there of the form 0.d,d,000 ...? (Say exactly.) 
(c) How many decimals are there of the form 0.d,d, ... d,000 ... for a given 
n? 


(d) Show that there are countably many terminating decimals. 


5. If a number is a solution to a polynomial equation with coefficients that are 
integers (for example, 3x7 — 7x + 19 = 0) it is called algebraic. For instance, 
V2 is algebraic since it is a solution to x7 — 2 = 0. 

(a) Show that all rational numbers are algebraic. 


(b) For any n, show that the collection of polynomial equations with integer 
coefficients and degree less than n is countable. 


(c) Show that the set of algebraic numbers is countable. 


6. The expression 2+3 = 5 may be interpreted "The union of disjoint sets with 
two and three elements is a set with five elements," while 2x3 = 6 says "The 
Cartesian product of a set with 2 elements and a set with 3 elements is a set 
with 6 elements." We may (loosely) interpret the expression Xp) + 1 = No to 


mean "The union of a denumerable set with a one-element set is 
denumerable." Interpret each of the following in words, and prove them: 


(a) Xo + Xo = Xo. 
(b) 8) +n =Np for any n€ N. 
(C) Xo x Ro = Xo. 


(d) Why does the first pair of sets mentioned in this problem have to be 
disjoint, while the second pair does not? 


7. (a) The set of ordered pairs of elements of a set S is called (after René 
Descartes) the Cartesian product of S with itself and is denoted S x S. Use 
a technique similar to that used to prove the countability of the rational 
numbers to show that N x N has the same cardinality as N. 


(b) Show that the function given by 


wie oe i+j-1 : 
F(i,j) = (’ : )4a 


maps N x N one-to-one and onto N. The expression (x) is referred to as a 
binomial coefficient, often pronounced "” choose k." (Note that it is not &.) 
It is the coefficient of x in the binomial expansion (1 + x)”. It is taken to be 
0 ifn < kand is given by 
n n! 
fa k(n — k)! 


otherwise. 


(c) Examine the pairs of numbers that get mapped to 1, 2, ... by the function 
in (b). Is there a connection between your proof of (a) and the function in 
(b)? 


2.3 UNCOUNTABLE SETS 


We find ourselves in a familiar position. We have defined what it means for a set 
to be "uncountable," but we don't know whether any uncountable sets exist. Now 
we will show that there is one. Consider the set of nonterminating decimals with 
all digits to the left of the decimal point zero. (It wouldn't hurt to think of these 
as the real numbers between 0 and 1, but we have yet to explore the relationship 
between decimals and the real numbers.) A function whose range is this set and 
whose domain is the natural numbers may be thought of as an infinitely long list: 


0.63548. .. 
0.30126... 
0.11670... 


me whe 


We will construct a decimal that is not on this list (that is, one that is not in the 
range of this function). If the first place of the first entry on the list is 6 (it is), 


make the first place of the new decimal 7. If the first place of the first entry is 
not 6, make the first place of the new decimal 6 (any two new digits will do, but 
we should not use 0 for one of them). Now look at the second decimal place of 
the second entry on the list and select the second place of the new decimal in the 
same way. Move on to the third place of the third entry, and so on. Our new 
decimal begins 0.767.... This is not one of first three entries on the list, and in 
fact the whole decimal constructed in this way is not on the list at all since it 
differs from the nth entry on the list at least in the nth decimal place. We have 
shown that no function from N to the set of such decimals is onto, and therefore 
that this set is uncountable. (Notice that it doesn't matter whether the function 
that describes the list is one-to-one.) 


This process is called Cantor diagonalization, after Georg Cantor, who first 
studied these ideas extensively in the late nineteenth century. His theories were 
vigorously denounced by prominent authorities of the time, but quickly won out 
as their essential correctness was understood by more people. The birth of this 
subject is well worth investigating if you believe that scientists are never 
motivated by pettiness, or that there is no real connection between mathematics 
and truth. 


EXERCISES 2.3 


1. Explain the remark "any two new digits will do, but we should not use 0 for 
one of them" in the description of Cantor diagonalization. 


2. Why would anyone object to the ideas of cardinality? 


3. Here is a recreational application of the idea of uncountability. 
(a) Show that there are uncountably many theorems in mathematics. 


(b) "Mathematical English" can be written with a countable collection of 
symbols. Show that for a given natural number 7, there are only countably 
many proofs that can be written in fewer than symbols. 


(c) Show that for a given natural number n, there is a theorem whose proof 
requires more than 7 symbols to write. 
4. The power set of a set S, denoted P(S), is the set of all subsets of S. 
(a) List the entire power set of S' = {a, b, c, d}. 
(b) Define a function from this set S to P(S) [simply pick sets to be f(a), f(b), 


fic), and f(d)]. Note that some elements of S are also elements of their image 
under your function, while others are not (either of these groupings might be 
empty). Make a list of the elements that are not elements of their images 
under your function (this list might also be empty). Note that the set you have 
just written down is not the image of any element under your function. 
Define another function and repeat this. Observe that the set you produce in 
the end never is an output of your function. 


(c) Here we show that if S is any set at all, the cardinality of P(S) is larger 
than the cardinality of S. Suppose f: S — P(S). Show that {2 : © ¢ f()} is not 
in the image of f- Explain how this proves that the cardinality of a power set 
is larger than the cardinality of the original set. 


. (a) Show that there are infinitely many different infinite cardinals. 


(b) Are there uncountably many infinite cardinals? 


. (a) Show that any set having an infinite subset is infinite. 
(b) Show that any set having an uncountable subset is uncountable. 


(c) If A © B, show that the cardinality of B is not smaller than the cardinality 
of A. 


. (a) Show that the natural numbers can be put in one-to-one correspondence 
with a proper subset of the natural numbers. 


(b) Give a plausible justification of the statement "Every infinite set has a 
denumerable subset." 


(c) Show that every infinite set can be put in one-to-one correspondence with 
a proper subset of itself (this is an alternative definition of "Infinite"). Think 
about the real number line as an example. Compare this with the result of 
Exercises 2.1.4.e. 


(d) Show that forming the union of an infinite set with a finite set does not 
increase the first set's cardinality. 


(e) Show that forming the union of an infinite set with a countable set does 
not increase the first set's cardinality. 


. Assuming for the moment that the set of real numbers is uncountable, show 
that there must be nonalgebraic numbers (see Exercise 2.2.5). Numbers that 
are not algebraic are transcendental. The familiar numbers e and z are 


transcendental, but the latter fact, especially, is difficult to prove. (That z is 
transcendental was not proven until 1882.) 


9. (a) Show that the set of subsets of a denumerable set is uncountable. 
(b) Show that the set of finite subsets of a denumerable set is denumerable. 


(c) Use (b) to show again that the set of terminating decimals is countable. 
(There is more work here than there might seem—be careful.) 


10. Carefully consider the difficulties that arise in Cantor diagonalization if one 
allows terminating decimals. Can the process be "patched up" to allow 
terminating decimals but avoid these problems? 


! We are being a little free with our understanding of the natural numbers, and this is a rather circular 
definition. An alternate characterization of "finite" can be found in Exercises 2.1.4 and 2.3.7. 


2 This exercise can be interpreted as saying that the cardinality of the inside of a square (I x I) is the 
same as that of an interval on the number line (1), which should come as quite a surprise. But this is a trick 
question. You showed in (a) that the cardinality of J is not larger than that of J x J. You showed in (b) that 
the cardinality of J is not smaller than that of J x J. If the cardinality of A is both not smaller and not larger 
than that of B, do A and B have the same cardinality? The result saying that this is so is called the Cantor- 
Bernstein-Dedekind theorem. It took some of the greatest minds of the era to prove this! 


3 These definitions leave us with still another question: If A has smaller cardinality than B, does B 
necessarily have larger cardinality than A? The answer is Yes, but this is remarkably difficult to show. If we 
take our definition to be "B has larger cardinality than A if A has smaller cardinality than B," we are left 
with an equally difficult question about functions. 


+ & is the first letter of the Hebrew alphabet. No is the "first infinity." 


5 The proof of this statement is beyond the scope of this book. Examine your answer carefully and pick 
out the assumptions you must make about set theory to make it work. 


Chapter 3 


Algebra of the Real Numbers 


3.1 THE RULES OF ARITHMETIC 


Just what is a real number? This is a deceptively subtle issue. Any guess we 
might make now would likely be, at best, incomplete. Failing to find an answer 
to this question, we might ask a simpler one: What can we do with real numbers? 
We know plenty about this. We can add and subtract, multiply and divide. The 
associative, commutative, and distributive laws, and such, occupied our attention 
for months when we were younger. Whatever the real numbers actually are, we 
know there are operations on them known as addition and multiplication that 
obey rules like these: 


(0) If we add or multiply two real numbers, we get a real number. 


(1) Addition is associative: If x, y, and z are any three real numbers, then (x + y) 
eee Ors a): 


(2) Addition is commutative: If x and y are any two real numbers, then x + y = 
Varn: 


(3) There is a special real number, 0 (the additive identity), having the property 
that 0 + x =x for any real number x. 


(4) For each real number, x, there is a real number, denoted —x, with x + (—x) = 0 
(—x is the additive inverse of x). 


(5) Multiplication is associative. 
(6) Multiplication is commutative. 


(7) There is a special real number, | (the multiplicative identity), having the 
property that 1 x x =x for every real number x. 


(8) For each real number except 0, there is another real number, denoted 1/x or x 
| with x x x! =1 («1 is the multiplicative inverse of x). 


(9) Multiplication distributes over addition: If x, y, and z are any numbers, then 
xX (y + Zz) = (x xy) + (x Xz). 


There are other rules of arithmetic, of course. For instance, if a and b are real 
numbers and a is not 0, there is a real number x so that a x x = b. This rule 
(which gives us the important ability to solve equations) is not on the list 
because we can prove it from things that are already there. We want the list to be 
as short as possible. 


We use some rules so much that we might not even realize they are rules. For 
instance, we always use 0 as the additive identity, and it might never occur to us 
that another number might do that job just as well. Could this be? Suppose z 
works as an additive identity. Then 0 + z = 0 (because z works this way). But 0 + 
z = z (because 0 works this way). Now 0 + z can be only one thing, and so it 
must be that z = 0. We have shown that There is only one additive identity. 
Arguments like this are part of "algebra" (a large subject of which factoring 
polynomials and the like are small parts). 


3.2 FIELDS 


Unfortunately, the rules of arithmetic have nothing to say about our common 
notions of what real numbers are. They make no mention of decimal expansions 
or number lines; they offer no hint of how to add or multiply. We can put this 
lack of specificity to work for us. Notice, for instance, that all of the rules remain 
valid if we replace the word "real" with the word "rational" wherever it appears. 
Perhaps there are still other structures that obey these rules. What about the 
integers? We can add them and multiply them, but integers generally do not have 
multiplicative inverses. Some familiar structures obey these rules, some don't. 
We begin the process of generalization by giving a name to "anything that obeys 
the rules." 


DEFINITION 3.1: A set F, with operations + and x, obeying rules (0) through 
(9) above (with "real number" replaced by "element of F" everywhere it occurs) 
is called a field. 


Most of our early mathematical education consisted of discussions of the field 
structure of the rational numbers and real numbers. Remember that one of our 
goals in this book is to discover ways in which these two particular fields are 
different, and algebra can often be used to make such distinctions. The set of 
rational numbers, Q, is a field. The set of integers, Z, is not. We gather from this 
that Q is different from Z (that some integers /ook different from some rational 
numbers is not sufficient evidence; see Exercise 3.2.5). But algebra alone doesn't 


allow us to distinguish the set of rational numbers from the set of real numbers. 
They are simply both fields. 


Are there any other fields? Consider the two-element set {0, 1} and define 
two operations as follows (we will circle the symbols to remind us that they are 
not ordinary addition and multiplication, even though the things being "added" 
and "multiplied" look like numbers): 


08B0=0 020=0 
061=160=1 081=180=0 
161=0 L@1= I 


It is easy to check that this structure obeys all the rules that define a field. This 
field is called Z,. 


Why would anyone do this sort of thing? Look at our proof that the real 
numbers have only one additive identity. It involved only the rules for a field, 
not any specific knowledge about real numbers. It works just as it is for any 
field, and so there is only one additive identity in any field. No matter how often 
we encounter fields, we need never prove this again. The generality of algebraic 
proofs gives them much power. We now know of three fields: Q, R, and Z,. 


Here are two more (you will check the details in Exercise 3.2.12). 


EXAMPLES 3.2: 1. The formal rational functions are the usual quotients of 
polynomials, but we don't concern ourselves with annoying details like their 


7r'4273~—4774147-9 


domains. For instance, **—62?=32+11 is a formal rational function because of 
what it looks like (that is, because of its form!). We already know how to add and 
multiply rational functions, and it is easy to check that this 1s a field. 


2. The set C of complex numbers is defined by endowing a special symbol, 
usually i (electrical engineers use 7, because i means something else to them), 
with the property i* = —1 (we shall see later that no real number has this 
property). Complex numbers are written a + bi, where a and 5 are real. 
Operations are done as if i is a variable, with i? replaced by —1 whenever it 
occurs, for instance: 


(4 + %) x (2 + 3%) 

(4)(2) + (2)(1)é + (4)(3)i + 37? 
= 8+141-3 
= 5+14:. 


While it is fairly easy to check that these structures are fields, it is not so easy to 
see that they are different in any significant way from the real or rational 
numbers. We will be able to make these distinctions later by examining another 
kind of structure. We should keep in mind that the real numbers and the rational 
numbers play important roles in many areas of mathematics. We will refer freely 
to their algebraic properties, but in this book we will never again pause and say 
"Let's see if that proof works for any field" like we have here. This kind of 
thinking, though, is a basic modus operandi of the algebraist: 


(1) While studying some familiar object, pause to write down rules that 
govern what you have learned. 


(2) Examine your list to see if there is any redundancy. Is there anything 
that can be proved from other things on the list? Toss anything that can 
off the list. 


(3) Give a name to "anything that obeys the rules." At this point the rules 
of one subject become definitions for another. 


(4) Study the things you've just named. The first thing you will want to do 
is to look for "something that obeys the rules" that is different from the 
objects you were thinking about when you made the list. 


There is another very important process of algebra that we haven't seen. It goes 
like this: We know something about fields now. How important are, say, 
multiplicative inverses, anyway? We can add and multiply integers, and 
everything seems to work just fine. What happens if we remove rule (8) from the 
definition of a field? "Things that work this way" are not necessarily fields. Give 
them a new name and start up again! 


EXERCISES 3.2 


1. We know that the integers and the natural numbers are not fields, but which, 
if any, of the rules for a field do each of these sets satisfy? 


2. If x is an element of a field such that x? = x, show that either x = 0 orx=1. 


3. Ifa and b are elements of a field F and a # 0, show that there is an a x « => 
so thata x x =b. 


4. Ifa and 5 are elements of a field, show that —(a + b) =—a —b. 


5. (a) Verify that Z,, as described in the chapter, is a field. 
(b) Let F = {a, b}, and define operations on F by: 


asSa=a a®a=a 
a@b=bGa=b a®b=b@a=a 


bSBb=a b@e@b=b 


Show that F is a field. 
(c) Explain how this field is similar to Z,. 


(d) Show that, other than changing their names, this is the only way to make 
a field of a set with two elements. 


6. Zs; is the set {0, 1, 2, 3, 4} with arithmetic done modulo 5, that is, do the 


usual operations and then subtract 5 repeatedly until the result is an element 
of the set. We can do arithmetic modulo any natural number greater than 1, 
so 3 + 4 = 2 (mod 5), 5 x 6 = 6 (mod 8), and 9 x 8 = 0 (mod 12), for 
example. 


(a) Show that Z; (or Z3 or Z7) is a field. 
(b) Show that Z, (or Ze or Z,4) is not a field. 


(c) For which values of is Z,, a field? State and prove a theorem. 


7. Show that each element of a field has only one additive inverse and that each 
nonzero element has only one multiplicative inverse. 


8. Show that 0 x x = 0 for any x in any field. 


9. (a) Consider the set D = {d}, with addition and multiplication given by d + d 
=dandd x d =d (this is the only way the operations could be defined on 
such a set). Show that D is a field. What is the additive identity in this field? 
What is the multiplicative identity? 


(b) In the field D given in (a), we have 0 = 1 (the additive identity equals the 
multiplicative identity), which is inconvenient. Show that 0 # 1 in any field 
with more than one element. (A field with only one element is not very 
interesting, and most algebraists include in their definition a stipulation that a 
field must have at least two elements.) 


(c) In the field D given in (a), multiplication and addition are the same (the 


product of any two elements is the same as their sum). This must be true, of 
course, in a field with only one element. Are there any other fields in which 
this is the case? 


10. Show that (—1) x x = —x in any field. (Note that (-1) x x 1s the product of 
the additive inverse of 1 with x, while —x is the additive inverse of x.) 


11. Ifx is an element of a field, show that (—x)* = x” (where x? = x x x). 


12. (a) Show that C is a field. In particular, find a formula for (a + bi)!. 


(b) Show that the set of formal rational functions 1s a field. 


13. The structure described in the last paragraph of the chapter is called a 
commutative ring with unity. "Commutative" refers to the multiplication 
operation, and "unity" is the multiplicative identity. 


(a) Show that the integers and the set of polynomials are commutative rings 
with unity. 

(b) Guess the definition of "ring." (What happens when you leave out the 
adjectives?) 


(c) (If you've taken linear algebra ...) Show that the set of 2 x 2 matrices is a 
ring with unity (but is not commutative). 


(d) One of the axioms deleted from the definition of "field" to obtain the 
definition of "ring" is the existence of multiplicative inverses. Is the set of 
invertible 2 x 2 matrices a field? 


(e) Give an example of a ring without unity. 


14. Here is more Linear Algebra: Let M, consist of all 2 x 2 matrices of the 
a b 


form ( b a ) where a and b are real numbers. 


(a) Show that M, is a field under ordinary addition and multiplication of 
matrices. 


(b) Find the multiplicative identity in M,. 


0 1 
(c) Find the square of ( 1 0 ) 
(d) Discuss the relationship between M, and C. 


15. It is an underlying assumption of this text that we understand how the 
rational numbers work. Just to be sure ... 


(a) Show that the sum and product of two rational numbers is a rational 
number. 


(b) Show that the sum of a rational number and an irrational number is an 
irrational number. 


(c) Is (b) also true for products? 
(d) Is (a) true for pairs of irrational numbers? 
(ec) Show in detail that Q is a field. 


(f) Is the set of irrational numbers a field? 


16. (a) Let F be the set of all real numbers of the form «+ 6V2, where a and b 
are rational numbers. Show that F is a field. 


(b) If k > 0 is a rational number that doesn't have a rational square root, show 
that the set {@ + bVk : a,b € Qhisa field. 

(c) If the number & in (b) does have a rational square root, show that the set 
constructed is just Q. 

(d) Suppose k is a rational number that doesn't have a rational square root 
(this time, & might be negative). We endow the symbol ¢ with the property 
that o* = k. Show that the collection of symbols a + bo is a field (note that 
they may not be real numbers). Here multiplication and addition are carried 
out as if © were a variable, with o* replaced by k whenever it appears. For 
instance, we would have: 


(1+0)(2+0) =2+30-+Kk. 


(ec) Suppose F is any field and & € F is such that there is no element of F 
whose square is k. Repeat part (d) of this exercise in this setting. 


17. According to the previous exercise, the collection of symbols «+bvV—5, 
where a and b are rational numbers, is a field. 


(a) Define an "absolute value" on this field by saying |}@ + V—5) =va* + 5°, 
Show that this function resembles the absolute value in the sense that, if x 
and y are in this field, then I x yll= lhl x lbyll. 


(b) We may define "integers" in this field to be those elements where a and b 


are both integers. Show that, if x and y are "integers" and x divides evenly 
into y, then lil < lly. 


(c) Show that 1+ V-5 and 1— V-5 are "prime" in this field in the sense that 
the only numbers that divide evenly into either of these are 1 and the element 
itself. Show that 2 and 3 are also prime in this field. 

(d) Show that § = 2 x 3 = (1 + V—5) x (1 — V=5) in this field. (So it is possible 
to factor 6 into primes in more than one way.) 


! This is an example of the mathematical usage of the word "formal." It means that we should ignore 
details that we would otherwise consider important. In other contexts such a discussion would be called 
informal! 


Chapter 4 


Ordering, Intervals, and Neighborhoods 


4.1 ORDERINGS 


When we first learned about number systems, "less" and "greater" meant "to the 
left" and "to the right" on the big number line above the blackboard. Together 
with pictorial interpretations of addition and multiplication, this view served its 
purpose quite well, and we learned a lot from it. Our understanding of algebra 
has matured, though, and we need a more precise idea of the ordering of the real 
numbers to go along with it. 


We have a lot of choices when we set out to impose an ordering on a set. 
Even the simplest of orderings can be described in more than one way, and we 
are free to pick a description that suits our purposes. But no matter how we 
describe the ordering of a set, our goal is to be able to pick any ordered pair of 
elements of the set, (a, b), and say whether the statement a < 5 is true or false. 
The ordering itself is considered to be the set of pairs for which a < b is true. 
(The ordering technically focuses on the "<" symbol, but of course we also 
should know what ">" means.) 


EXAMPLES 4.1: 1. We could order words by simply counting their letters. 
Under such an ordering, we have, for instance, BIG < SMALL and FOUR < 
THREE. Many words that are not equal are also neither greater than nor less 
than each other under this ordering. 


2. Words are usually ordered by lexicographic (or dictionary) ordering. To 
decide which of two words is "smaller," we compare their first letters 
alphabetically. If the first letters are the same, we compare the second letters, and 
so on. To compare words of different lengths, we decree that the shorter word 
ends with a blank, which we take to be at the beginning of the alphabet. For 
instance, HEDRIN < HEPBURN and CAT < CATACLYSM. Here is a portion of 
this ordering from one dictionary: 


moonstruck < moonwalk < moonwort < moony < moor < moorage. 


Elements of an ordered set are said to be comparable if one of the statements a 


<b, a> b, or a = bis true. Observe that, unlike the previous example, every 
word is comparable to every other word in the lexicographic ordering (of course, 
there might be disagreement about which collections of letters are words 
—"moonwalk" certainly doesn't appear in all dictionaries). Notice also that in 
this ordering (in this dictionary), there is no word between "moonwort" and 
"moony." We will see that these are useful properties for an ordering to have. 

3. Let X = {a, b, c}. We may define an ordering on P(X) (the power set of X) 
by saying S < Tif S c T and S # T. Then, for instance, {a} < {a, b}. The < 
relation can be described in a diagram: 


i) 


Here, A < B if it is possible to go from A to B by moving upward along arrows. 
Note that there are no horizontal arrows, because, for instance, neither {a} < 
{b}, {b} < {a}, nor {a} = {b} is true. In this ordering, {a} and {b} are not 
comparable. 


4. The decimals we introduced when discussing uncountable sets can also be 
ordered lexicographically. However, if we insist on thinking of them as the real 
numbers between 0 and 1, we get some surprising results. For instance, 0.37455 
< 0.702, as we would expect. However, we have 0.4999... < 0.5 under the 
lexicographic ordering, where we would expect these two decimals to be equal. 
In the lexicographic ordering of these decimals, as with words in the dictionary, 
there are pairs of different elements with no intervening element. This also does 
not happen in the usual ordering on the real numbers. Changing the ordering of a 
set can affect its structure dramatically. 


All the orderings we will study have the property mentioned in Example 4.1.2: 


Every element is comparable to every other element. The name we give this 
property is suggested by the "line" of words in the example. It is with this image 
in mind that we sometimes refer to the set of real numbers as the "real number 
line," and to real numbers as "points." 


DEFINITION 4.1: An ordering on a set is linear if, for any pair of elements of 
the set (a, b) one and only one of the following holds: 


(il a <8; 
(ii) a= b; 
or (iii) b< a. 


This is called the trichotomy, a word we learned as school children mainly 
because it sounds so fancy. While the orderings described in the examples above 
might be useful for some purposes, there does not seem to be any connection 
between what the elements of the sets are and the ordering imposed on them. In 
the next section we will see an example where the ordering of a set and the 
meaning of its elements come together. 


EXERCISES 4.1 


1. If Sis a subset of an ordered set, a least element of S is an element x, if there 
is one, such that (i) « € S and (ii) ify © 5 and y is comparable to x, then x < 
y(’) 

(a) Show that a subset of a linearly ordered set can have at most one least 
element. 


(b) Show that in an ordering that is not linear a set can more than one least 
element. 


(c) Show that a subset of a linearly ordered set might not have a least 
element. 


2. Construct examples of ordered sets for which various combinations of parts 
of the trichotomy fail. 


3. Are the decimals we discussed, with the lexicographic ordering, linearly 
ordered? What if we consider the decimals to be real numbers? (Think about 
0.4999... and 0.5.) 


4. The following "proof" contains at least two serious errors. Find them and say 
why they are errors; then give an example to show that the result is false. 
(Note that the fact that the proof is incorrect is not enough to guarantee that 
the result is false. On the other hand, the fact that the result is false means 
that the proof must be incorrect, though knowing this may not help us find 
the errors.) 


"THEOREM": Given any infinite subset of the real line, S, there are 


infinitely many pairs of numbers (a, 5) such that a < b and 4.9 € 5 but 


Ce 
¥ 


"PROOF": Let S = {s), 55, ...}. Since s; < sy <..., each of the pairs of 
numbers (s,,, 5,.,,) satisfies the conditions of the theorem. 


5. (a) How many ways are there to impose an ordering on a set with two 
elements? Three elements? NV elements? 


(b) How many of these orderings are linear? 


6. If we agree that the denominators used in representing rational numbers 
should always be positive, their usual ordering is given by 


Ne , 
— <— & ps < qr. 


a 


Show that this is a linear ordering. 


7. Let (Sa : @ € A} be a collection of sets and suppose that A is linearly ordered. 
Define the limit supremum and limit infimum of the collection {S,} as in 


Exercise 1.15.9 and generalize the results there. 


8. If X is any set, the power set of X can be ordered as in Example 4.1.3, by 
saying that S < TifS Cc T and SFT. 
(a) Is this ever a linear ordering? 
(b) Show that an ordering constructed in this way has the following property: 
If5,T € P(X), then WU € P(X) 3(S < U and T < VU). This says: "S and T may 
not be comparable, but there is something comparable to, and bigger than, 


both of them." A set with an ordering having this property is called a 
directed set. 


(c) Suppose we order P(X) by saying S < Tif S > T and S # T (notice the 
change!). Show that_X is a directed set with this ordering. 


4.2 THE ORDERING OF THE NATURAL NUMBERS 


Even something as familiar as the usual ordering on the natural numbers may be 
described in more than one way. The first of these definitions is the more 
traditional, while the second is more closely related to our work in Chapter 2. 
You will show in Exercise 4.5.2 that the definitions are equivalent. 


DEFINITION 4.2: If m and n are natural numbers, we say m < n if either 
(a) n is among the natural numbers: m+1,m+1+1,m+1+1+1,... 
or 


(b) no function whose domain is a set with m elements and whose range is a set 
with n elements is onto. 


4.3 WELL-ORDERING AND INDUCTION 


Definition 4.2 allows us to indicate (if not list) all the pairs in the ordering of the 
natural numbers. We will not spend time discussing the ordering of the natural 
numbers per se. Our interest is in a special property it has. 


DEFINITION 4.3: A linearly ordered set is said to be well-ordered if every 
nonempty subset of it has a least element.? 


The rational numbers are not well-ordered. For instance, 1? © Q@:? 29} has a 
least element, but (2? © Q : P > 9} does not. That the natural numbers are well- 
ordered is an axiom. If a set of natural numbers is given to us in a list, it is easy 
to pick out its least element. The least element of {12, 6, 3, 173} is 3. On the 
other hand, if a set is described in some way, picking out its least element may 
not be so simple. What is the least element of the set of natural numbers that can 
be written as a sum of three primes but can't be written as a sum of two primes? 
Are there any? This may take some thought. We will come to see, however, that 
well-ordering helps us not so much by picking out the least element of a set as 
by guaranteeing that there is one. If we can find one natural number fitting some 
description, we know there is a /east number fitting that description. 


As an illustration of the importance of well-ordering, we will prove the 


following theorem, which gives us an important technique of proof called 
induction. 


THOEREM 4.4: Suppose 5 © N is such that 


(i) le S 
and (ii) ke S=>k+1€S whenever k > 1. 


Then S=N. 


PROOF: Suppose S # N and let JT = N\S. Since S # N, we have T # By the 
well-ordering property, 7 has a least element; call itt. Now t# 1 because 1 € S. 
Let s =t-—1. Since t> 1, s is a natural number. Since ¢ is the /Jeast element of 7 
and s < ¢, s can't be in 7, and sos € S. But then! = s+1€ 5, a contradiction. = 


The well-ordering property and the validity of induction are equivalent (you will 
verify this in Exercise 4.5.7). Either may be taken as an axiom of the natural 
numbers, with the other being a theorem. 


EXAMPLES 4.3: 1. Induction is often used to prove arithmetic formulas. Let us 
show that 1 +2+...+n =n(n+ 1)/2. Let S be the set of natural numbers for 
which this is true. Now 1 € S since 1 = 1(1 + 1)/2. Suppose & € S. Then 

k(k +1) 


1+2+:---+ pe ee ee 


a 5 . 


a 


To show that & + 1 © S, we must show that 


L2 sss (hk +1) 


(A + 1)(k +2) 
») 


But note that 


which is just what we want. = 


2. Induction can also be used to prove general statements about finite sets. We 
will show that every finite set has a largest element. This result pops up 
repeatedly in our work (we could have used it to prove Theorem 2.4). Let S' be 
the set of natural numbers for which "Any set with exactly n elements has a 
largest element." Clearly 1¢S. Suppose k € S (so any set with exactly k 
elements has a largest element) and let T be a set with exactly k + 1 elements. 
Let t ¢ T. Then 7\{t} has exactly k elements, and so it has a largest element, say 
b. If t > b, then ¢ is the largest element of 7. If t < 5, then b is the largest element 
of 7. In either case, T has a largest element since both ¢ and b are in 7. 


3. Induction is a powerful tool that must be used with care. Failure to exercise 
proper caution can lead to some curious results. For instance: Let S be the set of 
natural numbers for which the statement "In any set with exactly n elements, all 
the elements are the same" is true. We will show that S = N. (Roughly translated, 
this means "Among all the objects in the universe that can possibly be elements 
of any finite set, there is only one thing.") This appears to be false, but .... 
Clearly 1 € S (in any set with exactly 1 element, all the elements are certainly the 
same). Suppose that k € S (that is, in any set with exactly & elements, all the 
elements are the same) and let 7’ be a set with exactly k+ 1 elements, say T= {t,, 


th, .--, tr}. Let 7; = T\{t,} and 7, = T\{t)}. Now T, and 7, are both sets with 
exactly & elements, and therefore all the elements of 7, are the same and all the 


elements of 75 are the same. But fei € NNT 


, and so all the elements of 7; are 
the same as ¢,,, and all the elements of 7, are the same as ¢,,,. Therefore all the 
elements of T are the same since they are all the same as ¢,,,. Something fishy is 


going on here (unless you believe the result!). You will explain what it is in 
Exercise 4.5.8. 


In both of the real examples of induction, the set in question is taken to be the set 
of natural numbers for which some open statement is true. We use this 


observation to state induction in a form more useful for our purposes: 


THEOREM 4.5: Suppose P(n) is an open statement, where n can be any 
natural number. If 


(1) P(1) is true 
and (ii) P(k) = P(k + 1) whenever k= 1, 
then P(n) is true for allneN. 


PROOF: All we need to do is generalize the arguments in the examples. Let 
S={néN:P(n) is true}. Suppose that S # N (that is, that there is a natural 
number n for which P(n) is false). Then N\S has a least element, say np. Since 
P(1) 1s true, we know np # 1. Then P(g — 1) is true [since ny — 1 < no, and no is 
the /east natural number for which P(n) is false]. By (11), it follows that P(79) = 
P(ng — 1 + 1) is also true, a contradiction. = 


Notice that statement (ii) of Theorem 4.5 is an implication. Its hypothesis is 
called the induction hypothesis. The moment in a proof at which this 
implication is established is usually called the induction step of the proof. It is a 
good idea to point out when this occurs. 


Doing proofs by induction can be a little unsettling. In using P(x) as a 
hypothesis, it might look like we're assuming what we're trying to prove, an 
activity we've been warned about at some length. Fortunately, this is not the 
case. The thing we're trying to prove is a list of statements: P(1), P(2), ..., while 
the thing we're assuming is just one entry in the list. In any event, we are not 
assuming that P(x) is true, we are only showing that if P(A) is true, then so is P(k 
+ 1). 


4.4 ORGANIZING PROOFS BY INDUCTION 
Some of the mystery can be taken out of induction by organizing the proofs 


carefully. It is useful to begin by writing down and labeling the open statement, 
and showing the variable clearly. For instance, we might wish to prove that? 


n(n + 1)(2n +1) 


P(n):1°+2*+---+n? = ‘ 
0 


Now write down and check P(1): 


11\(9)(: 
P(1):1= Laity). 

6 
Write the induction hypothesis, labeling it "Assume" (or something else to 
indicate its role in the problem): 


- ‘ K(k + 1)(2k + 1) 


Assume P(k) : 1° +2°+---+k? = - 6 
5 


Write the conclusion of the inductive step, labeling it something like "Want": 


i. dees , ‘ (k 4 (kA + 2)(2k +: 
Want P(k +1): 17+2?+---+(k+1)? = is | t matey) 
) 


Now the work begins. Very often (especially with arithmetic formulas), we can 
see something resembling the induction hypothesis within the induction 
conclusion. In this problem it's not difficult to find. Notice that we haven't 
changed anything from the first line of this proof to the second, we have only 
highlighted a term that was already there. 


1? 4274 ---4(k+ 1)? 
a 17 + 27 + ...4+k2 + (k +1)? 
RR + ue ry (k +1)? [induction hypothesis] 
= k4 jo + 1) 6(k + 1) 
6 
= (k+ je Tas 


6 
(kK + 1)(k + 2)(2k + 3) 
6 a 


4.5 STRONG INDUCTION 


Consider the collection of numbers {f;, 5, ...} given by letting f, = 1, 4 = 1, and 
h, =Sn-1 + fn—2 for n = 3,4, .... It seems clear enough that these are all natural 


numbers.* Each is found by adding two things that were in turn obtained by 
adding two things that (and so on and so on) were both natural numbers. Does 


this constitute a proof? We should be very suspicious of arguments that involve 
"and so on and so on" as a crucial step. We can recognize this as an induction 
problem: 


P(n): fn EN 

P(1): fr: =1¢EN — [P(2) is also true] 
Assume P(k): fy € N 

Want P(k+1): fir EN 


But now the going gets a little rocky. The value of f,,, depends not just on f, but 
on both f, and f,_;. If we can't find a formula for f,,, depending only on f,, we 


would seem to be stuck (you are welcome to look for such a formula). We can 
get around this by choosing P more carefully: 


P(n): fi, fa,---, fr EN 
P(1): fi; =1EN 
Assume P(k): fi, fo,..., fr EN 


Want P(k +1): fi, fo,..., tks fraa EN 


Now it's easy (we actually have more information than we need). Simply note 
that 4, =A. + f,-1, both of which are integers. = 


This is an outline of a procedure called strong induction (though we will see 
that this is somewhat of a misnomer), which is described in the following 
theorem. 


THEOREM 4.6: /nduction is equivalent to the following: Let P(n) be an open 
statement, where n can be any natural number. If 


(1) P(1) is true 
and (11) (P(1), ..., and P(k)) = P(k + 1) whenever k => 1, then P(n) is true 
for allneN. 


We will show only that this statement implies induction. (This direction of the 
proof is of the form "weaker = stronger," and so it is the only one about which 
there is any real doubt.) While what is actually going on in this proof is not 
difficult, the structure of the proof is a little complicated. It is of the form (A > 
B) = (C = B), where A consists of the hypotheses of Theorem 4.6, B is the 
statement "P(7) is true for all n," and C consists of the hypotheses of induction 


(Theorem 4.5). The contrapositive of this is (not (C > B) => not (4 = B)), or ((C 
and not B) = (4 and not B)). This is equivalent to C > A, which is what we will 
prove. 


PROOF: Suppose that P(n) is an open statement and that the hypotheses of 
induction hold, that is, P(1) is true and P(k) = P(k + 1) for k => 1. Then the first 
hypothesis of Theorem 4.6—P(1) is true—holds. Suppose that P(1), ..., P(A) are 
true. Then in particular, P(A) is true. Since the hypotheses of induction hold, it 
follows that P(A + 1) is true, and we are done. = 


Strong induction is important not so much as a separate technique of proof (by 
constructing our propositions carefully, we can avoid using it explicitly), but as a 
signpost to bigger and better things. If we rephrase strong induction in the 
language we first used to describe induction itself, it would look like this: 


If S N is such that1 € S and, for eachn>1,thi:k<n}oS=>ne€S thenS=N. 


Notice that this statement makes sense with N replaced by any well-ordered set 
and 1 replaced by the least element of the set (and there are well-ordered sets 
that are bigger and more complicated than we can possibly imagine just now). 
The resulting statement is a very deep and powerful tool called transfinite 
induction. 


EXERCISES 4.5 


1. Carefully state the definition of well-ordering in terms of the definition of an 
ordering based on sets of ordered pairs. 


2. Show that the two parts of Definition 4.3 are equivalent (that is, if m < n 
according to one definition, then m < n according to the other). 


3. Verify the claim that ip © Q + p > 9} does not have a least element. 


4. Show that the real numbers, rational numbers, and integers are not well- 
ordered in their usual orderings. 


5. What is the smallest natural number that can be written as a sum of three 
primes but can't be written as a sum of two primes? 


6. Construct a proof of Theorem 2.4 based on the fact that every finite set has a 


largest element. 
7. Carefully state and prove that "Jnduction implies well-ordering." 


8. (a) Explain the bad example of induction. 


(b) Explain why the example just preceding it—in which much the same 
thing seems to be done—is valid. 


9. (a) Explain why "strong induction" is a weaker theorem than induction. 
(b) Show that ((X¥ > Z) = (Y>Z)) © (Y= X). 


10. (a) Show that if S' is a set with n elements, then the power set of S has 2” 
elements (this is why it is called the power set). 


(b) Show that n < 2” for all n EN. 


11. (a) Show that the following expression is an integer forn = 0, 1, ... 


1 ( ( 1+ “) ( L= 4) 
ie 5) 5 
v5 : “ 


(It might help to do some experiments with a calculator first.) 


(b) If x,, is the expression in (a), show that 


This number is sometimes called the "golden ratio." It was considered by the 
ancient Greeks to represent a perfect proportion, and much of their art and 
architecture include shapes that contain it. 


12. Let P(n) be the statement "n* + 9n + 5 is even." 
(a) Show that P(k) = P(k+ 1) fork> 1. 
(b) For which n is P(n) true? 
(c) What went wrong? 


13. (a) Show that any subset of an initial segment of N is finite. (Be very 
careful with the definitions here—this is not as simple as it looks. The word 


"finite" is a clue.) 


(b) Show that a subset of a finite set is finite. 


14. (a) Prove this useful variation on induction: Jf P(n) is an open statement 
whose domain is Z and if 
(1) P(n") is true for some n* 
and (11) P(k) = P(k + 1) whenever k =n", 
then P(n) is true for alln =n". 


(b) Find all values of n for which n? < 2”. Prove your result. 


15. (a) Suppose that S is a subset of N with the properties: 


(i) 2"€ Sforn=1,2,.... 
and (11) Ifk ¢ Sandk>1,thenk-—1les 
Show that S = N. 
(b) Condition (i) may be phrased “? = {2" : » = 1,2...) © 5.” State a condition 
on (as opposed to a description of) the set P that would yield the same result. 


(c) Could the condition “* —1 € S” be replaced by “k—2¢€S” and keep the 
result? 


16. (a) If x1, x5, ..., x, are all nonnegative, their geometric mean is defined to 
be Glt1,22)---%n) = Y©it2---tn (note that, strictly speaking, G is not a single 
function but a collection of functions). Show that G is similar to the ordinary 
(algebraic) mean in the sense that 

(1) min {x), X75 +25 be < G(x, X75 +265 X,) < max {x}, X75 +25 Xa 
(11) There is equality on either side above if and only ifx, =... =x,. 
(11) G(X], X5, .--, X,, G(X,, ve this useful variax,, ..., x,)) = GX, x, ..., 
Lan 

(First show that statements like these hold for the algebraic mean, then 

show them for the geometric mean.) 


(b) Show that, if x), x5, ..., x, are all nonnegative, then 


(This is called the algebraic-geometric mean inequality.) 


(t) = moray 
17. Recall that the binomial coefficients are given by: © ““" "” 


(a) Show that binomial coefficients satisfy the equation 


n n° a 1 
(;) (,. .) 7 peal 
(b) This equation is closely related to Pascal's triangle. Explain. 


(c) Verify the claim made in Exercise 2.2.7: The coefficient of x* in (1 + x)" 
- ( n ) 
is \k), 


18. (a) Show that 1+3+...+(2n-1)=n? forn=1,2,.... 

(b) Show that D2 tag t+ aaly = aT for n= 2s ages 
19. Ifa >-—1, show that (1 + a)” > 1+na. This is called Bernoulli's inequality. 
20. Show that 7” — 6n — | is divisible by 36 for n = 1, 2, .... 


21. (a) Show that, for any natural number n, 


n(n + 1)(n + 2) 


(1 x 2)+(2x3)+---+n(n+1)= 3 


(b) Show that, for any natural number n, 


(c) Do some experiments and make a guess of a formula for the sum of the 
first n fourth powers. Prove your result. (Is there a pattern in the formulas for 
the sums of first, second, and third powers? It is clearly a polynomial in the 
variable n. What is its degree? Such a formula is called a "closed form" for 
the sum.) 


22. Bees have an unusual biology. A male bee has only one parent (the queen), 
while female bees have two parents. Starting with a male bee, count the 


number of its parents, grandparents, great-grandparents, and so on. 


23. (a) Suppose that for each ™:” © N, P(m, n) is an open statement with 
variables m and n. Establish "double induction": [/f 


(1) PC, 1) is true, 
(11) P(m, k) = P(m, k + 1) for any m and k= 1, 2, ..., 
and (111) P(k, n) = P(k+ 1, n) for anynandk=1,2,..., 
then P(m, n) is true for all m and n. 


(b) Show that 2”"2” = 2’”"*" for all natural numbers m and n. 


24. (a) Suppose A = {aj, do, ..., a,} 18 a finite set of real numbers. Show that it 
need not be the case that a, < a, <... <a, (a simple example will do). 
(b) Show that such a set can be renumbered in such a way that a; <a, <...< 
a, (that is, show that the function /: {1, ..., 1} — A in the definition of finite 
can be taken to be increasing). 


(c) Suppose S is a denumerable subset of the real numbers. Can the function f 
: N — S that establishes that S is denumerable always be taken to be 
increasing? (That is, can a denumerable set always be numbered in increasing 
order?) 


25. Show that |sin nx| < n|sin x| for all x andn = 1, 2, .... 


26. Ifx), x5, ..., x, are all positive, their harmonic mean is: 


H(x1,%9,-.-,;2n) = - : 
(71,22 Tn) t+ite¢i 
(7 is the reciprocal of the average of the reciprocals.) 


(a) Show that the harmonic mean satisfies conditions (i), (11), and (ii1) of 
Exercise 4.5.16.a. 


(b) By doing some experiments, guess an arithmetic-harmonic mean 
inequality. Prove your result. 


(c) How about a harmonic-geometric mean inequality? 


4.6 ORDERED FIELDS 


We had to know a good bit about the natural numbers to define the ordering on 
them. It would be an enormous undertaking to specify the truth value of a < b 
for all pairs of real numbers. Furthermore, we have a secondary goal to consider. 
Whatever ordering we devise for the real numbers should be closely related to 
their field structure. Though this consideration may seem to make the problem 
more difficult, it actually allows us to narrow our focus. Since we can subtract in 
a field, it isn't necessary to assign a truth value to a < b for every a and 5, but 
only to the expression x > 0 for each x (we would certainly want a < b to have 
the same truth value as b — a> 0). 


DEFINITION 4.7: (a) Let P be a nonempty subset of a field F. Suppose 


(i) Ifa e Pandbe P, thenab€ P anda+beP 
and (ii) For each x & F, exactly one of * © P.t = 9, or —x € P holds. 
Then P is called a positive set. 


(b) A pair (F, P), where F is a field and P is a positive set, is called an ordered 
field. 


(c) In an ordered field, we say a<b if ® + (~@) € P, 


If x € P, we say x is positive, and if —x € P, we say x is negative. Note that the 
ordering on an ordered field is linear. This definition is related to the field 
structure in a big way. The only substantial part of the definition of a field we 
don't see is the multiplicative inverse. 


A precise definition of a positive set in the real numbers is given in Chapter 
22. For now, we will have to view statements about the positive set in the real 
numbers as we have statements about the field structure of the real numbers: We 
know that they will be proved later. Suffice it to say that the positive set of the 
real numbers is pretty much what we think it is. We spent years in elementary 
school looking at theorems about the ordering of the real line. Here we will 
select a few that are of particular importance and that illustrate the interplay of 
algebra and order. 


THEOREM 4.8: /f(F, P) is an ordered field, a < F, and a #0, then a’ € P. 


PROOF: Since a # 0, either a <€ P or-—a € P. In the first case, the result follows 


from the definition of P. In the second case, note that ® = (~@)° € P (by Exercise 
$11). a 


COROLLARY 4.9: Jn an ordered field, | € P. 


PROOF: Since P # and ° € P, an ordered field has more than one element. By 
Exercise 3.2.9, we have 140, andsol=1°¢€P. a= 


THEOREM 4.10: The product of a positive element of an ordered field and a 
negative element is negative. 


PROOF: If a ¢ P and —b< P, then ~(@9) = a(-8) © Pg 


COROLLARY 4.11: Jn an ordered field, « < P if and only if x~' € P. 
PROOF: Left as Exercise 4.6.5. = 


We can use these results to show that the real numbers and the complex numbers 
are different: By Corollary 4.9, we have —! ¢P in any ordered field. In C, 
though, —1 is a square (—1 = i”), and so by Theorem 4.8, we should have —1 € P, 
a contradiction. We have not only shown that our usual idea of ordering doesn't 
work in C. We've shown that it is impossible to define an ordering that makes C 
into an ordered field. 


Most of what we learned in our youth about the ordering of the real numbers 
is contained in the next theorem. 


THEOREM 4.12: Let a, b and c be elements of an ordered field. 
(a) Ifa<b, thna+c<bte. 

(b) Ifa < band c€ P, then ac < be. 

(c) Ifa < band -c€ P, then bc < ac. 

(d) Ifa < b, thena<(a+tb)/2 <b. 

(ce) Ifa<bandb<c, thena<c. 


PROOF: We will prove parts (a) and (d). (a) We want (9+¢)~(@+e) © p, But 
(b+ c) —(a+c)= b-—-aeEP by hypothesis. 

(d) Using part (a) twice, we have: 2a =a+a<a+b<b+bh=2b. Observe that 
2=1+1€ P. By Corollary 4.11, we may divide by 2 to obtain the result. = 


So far, we have tended to consider the natural numbers as something apart from 
the other structures we've discussed. We can bring the discussions of the natural 


numbers and ordered fields together by observing that every field contains the 
elements: 1, 1 + 1, 1 + 1+ 1, .... In some fields, these elements aren't all 
different (in Z,, for instance, we have 1 + 1 + 1 = 1). The next theorem tells us 
that this can't happen in an ordered field, and consequently that every ordered 
field contains a "copy" of N. 


THEOREM 4.13: Jn an ordered field, the elements 1,1 + 1,1+1+1,... are 
all positive and all different. 


PROOF: Consider the open statement ley (Dja1 t) <P Notice that the 1 


that is the lower limit of the summation is a natural number, while the | that is 
inside the summation is the multiplicative identity of the field. 


P(1):1€P is true [Corollary 4.9} 


Assume P(k) : pe 1) EP. 


—) 


Want P(k +1): (F411) €P. 


Se (ay es SH iep. 
Now “> [oe , and so 2=1! © P since it is the sum of two 


elements of P. The result follows by induction. To see that these elements are all 
different, observe that the difference of any two of them is either another one of 
them or the additive inverse of one of them. Any such difference is either 
positive or negative (that is, not 0). = 


COROLLARY 4.14: An ordered field is infinite. = 
We may refer to the elements 1 + 1, 1+1+ 1, ..., of an ordered field as 2, 3, .... 
Note that a field that contains such a copy of N also contains copies of Z and Q, 


and so it makes sense to refer to "integers" and "rational elements" in any 
ordered field. 


EXERCISES 4.6 
1. Show that the ordering on an ordered field is linear. 


2. (a) Suppose a and b are elements of an ordered field. Show that a* + b? = 0 if 
and only ifa =b=0. 


(b) Does this remain true if the field is not assumed to be ordered? 


3. (a) None of the fields Z, can be ordered (why?). Give an example of a value 
of p and elements a and b of Z,, such that a # 0 and b # 0 but a* + b*=0. 


(b) Are there values of p for which the situation in (a) can't happen? Consider 
a characterization of all values of p for which this is possible (or impossible). 


4. In the complex numbers, we say the distance from a + bi to 0 is Va* + 5°. We 
might define an ordering on C by saying z < w if it is closer to 0. Show that 
this does not make C into an ordered field. 


5. Prove Corollary 4.11. 
6. Complete the proof of Theorem 4.12. 
7. Complete the induction in the proof of Theorem 4.13. 


8. Show that the field of rational numbers has the following property: Given 
any rational number r = p/q, there is a natural number n,. with r < n,. (This is 


not difficult. Simply construct n, from 7.) An ordered field in which the 


natural numbers are distributed in this way is said to have the Archimedean 
property. This will be carefully defined and discussed in Chapter 6, where 
we will show that it also holds for the real numbers. 


9. Show that {r : r = p/q, where P:7 © 4 have the same sign} is a positive set 
on Q. 
10. (a) Show that { € R : 4p,g © Q> (p,q > 0 and x € |p,q))} is a positive set on R. 


(b) Explain why this couldn't be used to define a positive set on R. 


11. Consider the field M, defined in Exercise 3.2.14. 


(a) We might try to define a positive set on M, by saying one of these 


matrices is to be "positive" if both a and b are positive. Show that this doesn't 
make M, into an ordered field. 


(b) What, if anything, does Exercise 3.2.14.c say about the possibility of 
defining a positive set on M,? 


12. (a) Show that in Z; we have 4 = —1 (remember what —1 means!). 
(b) Find an element of Z; that satisfies the equation x? =—1. 
(c) What does part (b) say about the possibility of making Z,; into an ordered 


field? 
(d) But isn't0 < 1<2<3<4a perfectly good ordering on Z;? Explain. 


13. Suppose a and db are real numbers such that for every ¢ > 0, a + e > b. Show 
that a > b. Show that it is not necessarily the case that a > b. 


14. (a) Explain why a field that contains a copy of N also contains copies of Z 
and of Q. 


(b) Show that C contains copies of N, Z, and Q (identify them exactly). 
Nevertheless, C cannot be ordered. Is there a contradiction here? 


(c) The set of integers contains a perfectly good copy of N, but does not 
contain a copy of Q. How can this be? 


15. Can a field be ordered in more than one way? That is, can there be two sets 
P, # P> that both satisfy the definition of a positive set? 


4.7 ABSOLUTE VALUE AND DISTANCE 


Since each nonzero element of an ordered field is either positive or negative, we 
may make the following definition. 


DEFINITION 4.15: The absolute value of an element x of an ordered field (F, 
P) is given by: 


Pee zs ifteEePorz=0 
b= fee? 


It doesn't matter whether we define |0| to be 0 or —0. The absolute value function 
is familiar from algebra and calculus. We will state only one theorem, leaving its 
proof as Exercise 4.7.1 (good practice in proof by cases): 


THEOREM 4.16: (The Triangle Inequality) If x and y are elements of an 
ordered field then 


|x| —|yl| < ja + yl] < |x] + ly. 0 


DEFINITION 4.17: If x and y are elements of an ordered field, the distance 
between x and y is given by |x — yj. 


The measurement of distances is crucial to our study, since we are often 
concerned whether points are close together. We may restate the right half of the 
Triangle inequality in this way: If x, y, and z are elements of an ordered field, 
then |x — y| <|x — z| + |z — y| (this is the symbolic formulation of the phrase "The 
shortest path between two points is a straight line"). 


We can begin to see the difference between two different approaches to 
analysis. "Hard" analysis is greatly concerned with the proof and application of 
inequalities, while "soft" analysis (the kind we will be doing) is not. The 
Triangle inequality is about the only inequality we will need. 


EXERCISES 4.7 


1. (a) Prove the Triangle inequality. 


(b) Under what circumstances can the < signs in the Triangle inequality be 
replaced with = signs? 


(c) Show that | — a| = Jal. 


(d) Show that |ab| = |a||b| and that sl = i as long as b #0. 
(e) Show that !e| = Va? for all a. 


2. Use the left side of the triangle inequality to show that addition is continuous 
in the following sense. For any a < R, Define the function f, by: f,~) =a + x. 


Then f, is continuous for any a. 


3. If we let d(x, vy) = |x — y|, show that d is a metric in the sense of linear 
algebra, that is: 


(1) d(x, y) = 0 for all x, y, and d(x, y) = 0 if and only if x =y. 
(ii) d(x, y) = diy, x) 
and (111) d(x, z) < d(x, vy) + diy, z) 


4. If f= (x1,2) € R*, let D, : R? > R and D, : R* — R be given by 


D,(z,¥) = ly: — 21| + |yo2 — ro] 


and 


D2(#, 9) = V/(yr — 21)? + (yo — 22)?. 


Show that D, and D, are both metrics on R’. 


4.8 INTERVALS 


Linearly ordered sets have a very useful type of subset called intervals. We are, 
of course, only interested in intervals of real numbers and (on rare occasions) of 
rational numbers. 


DEFINITION 4.18: Let S be a linearly ordered set. The set / © 5 is an interval 
if J has one of the following forms:> 


(1) {rEeS:a<zr<b} denoted (a, b) 


(2) {ee S:a<x<d} [a, 5] 

(3) {r€S:a<ar<b} [a, b) 

(4) {rE S:a<xr<b} (a, 5] 

(5) {rE S:a<a} (a, ©) 

(6) {te S:2<b} (—00, b) 

(7) {te S:a<gz} [a, 20) 

(8) {re S:a< bd} (—o0, b] 

(9) allofS (sometimes) (—c0, 0) 


The symbol o should not be endowed with any meaning except that given it 
here. It is particularly important to remember that this symbol does not represent 
an element of S. We insist that S be linearly ordered mainly because the sets that 
are of interest to us are, though the linear ordering does allow us to write S = 
(—oo, 00). We are concerned primarily with intervals of the first and second types, 


called open and closed, respectively. Sets of the forms (5), (6), (7), and (8) are 
sometimes called rays. 


Open intervals play a crucial role in defining an idea of "nearness" on the real 
line. Closed intervals will attract our attention less often, but when they do, it 
will be in a pivotal way. Here we will establish some technical results. These are 
chosen not so much for their depth but because they will be useful later. They are 
stated for intervals of the form (a, b) but can be modified to apply to intervals of 
other types. The first of these concerns the relationship between intervals and the 
measurement of distance in the real numbers and gives us a hint how we might 
use intervals to define nearness. 


THEOREM 4.19: /fx and y are elements of (a, b), then |x — y|<b- a. 


PROOF: We may suppose that a < y <x < b, so that |x — y| =x — y. Thenx — y 
<b-y<b-—a, by Theorem 4.12. = 


THEOREM 4.20: Jf * © (2,9) (e,@), and d — c < min{b — x, x — at, then 
(c,d) € (a,b). 


PROOF: Let ¥ © ‘°“ and suppose that x < y. By Theorem 4.19, y — x = |x — y 
<d-—c<min {b-—x,x-—a}<b-—x. Thusa<x<y<b,and so ¥ € (4,9), as 
desired. A similar argument holds if y < x. = 


4.9 WHEN SHOULD WE DRAW PICTURES? 


Though the proof of Theorem 4.20 is brief, it may not be entirely clear what is 
going on in it, or what it is really about. This can be cleared up with a few 
pictures. First, we look at a situation where the result does not hold: 


and one where it does: 


Note that min{b—x, x—a} is the distance from x to the nearer endpoint of (a, b). 
In the second picture, the inner interval isn't long enough to reach either end of 
the outer interval. This is what Theorem 4.20 is about. 


How useful are pictures? We know that we generally can't use them to prove 
things, but they can certainly help us get ideas. We could have introduced 
Theorem 4.20 with a question: "Give a condition that will guarantee that 


(c,d) © (a.6)" After drawing a few sketches like these, we would arrive at 
something like Theorem 4.20. We should never hesitate to draw sketches but 
must keep in mind their limitations. The main object of our study, the real 
number line, is not exactly the stuff of striking artwork, and so all our pictures 
will look very much like these two. 

More importantly, when should we draw a picture? You will notice that 
proofs almost never start with a picture. How do we decide when it is time to 
draw one? There are, as you might guess, no rules to help us with this, but there 
is one very useful guideline: Draw a picture when you have something to draw! 
When you reach an existentially quantified step in a proof (that is, when you are 
sure that something with specific properties exists), draw it, but be careful to 
include in your picture only those features that your current knowledge allows. 


EXERCISES 4.9 


1. (a) Show that, if a and b are elements of a linearly ordered set S and b > a, 
then (4, 00) U (—00, b) = S.. 
(b) Show that this does not necessarily hold if the ordering of the set is not 
linear. This is why we need S to be linearly ordered to say that S' = (—o0, 00). 


2. (a) Show that if c < d and (a, b) contains neither c nor d, then either 
(a,b) N (c,d) = 0 or (a,b) € (c,d). 


(b) Suppose that A and B are open intervals such that neither contains an 
endpoint of the other. Show that 4 and B are either disjoint or identical. (This 
result will play an important role in the proof of Theorem 8.11.) 


3. Suppose S is a set with the property that |x — y| => 1 for any two different 
elements x and y of S. Show that an interval (a, a + 1) can contain at most 
one element of S. Does this result change if a half-open or closed interval is 
used instead of an open interval? 


4. (a) If /=(a, b) and J = (c, d) are open intervals, show that / © / if and only if 
a>candb<d. 


(b) Does this result change if the intervals are not open? 

5. (a) Show that the intersection of two intervals is an interval. 
(b) If and J are intervals with // # 9, show that J U J is an interval. 
(c) Is the condition “! / # %” necessary in (b)? 


(d) Fill in the blank: If J and J are intervals, then /U/ is an interval is and 
OMY 18 ccna citer, 
(e) Reinterpret (a) through (d) with "interval" replaced by "ray." 


6. Modify Theorems 4.19 and 4.20 to allow for intervals that are not open. 


7. In the proof of Theorem 4.19, why can we suppose that y < x? 


4.10 NEIGHBORHOODS 


The next theorem gives us a useful way of describing intervals. Instead of giving 
the endpoints of an interval, we may specify a "center" and a "radius." We will 
often be concerned with intervals that are, so to speak, "short," and the e in the 
following theorem is an easy way of measuring this. We will consider two points 
to be close together if they are in many of the same short intervals. This theorem 
also shows one way in which the order and distance-measuring (or "metric") 
structures of the real line are related. 


THEOREM 4.21: If a < b. let c =(a + b)/2 and € = (b — a)/2. Then (a, b) = {x : 
tcl er 


PROOF: It seems that c is exactly in the middle of a and 5, and that ¢ is half the 
distance from a to 5 (these guesses are confirmed in the proof): 


Observe first that this is a set-equality problem, and so the structure of the proof 


6x € (ec, 


is determined for us. Let * © (@:) and suppose x > c. Then ) and so |x — 
c|<b-c=(b — a)/2 = «. Thus @ € {t: |t-¢| < ©} (and similarly if x < c). Now 
suppose * © tt: tel <<} and x >. Then |x — cl = x — c < «, and so 
a<c<u<ec+e=b, and so * © (4,5) (and similarly if x <c). = 


DEFINITION 4.22: An interval of the form {x : |x — c| < e} for some real 
number c and some positive real number ¢ is called an é-neighborhood of c (or 
an €-interval around c). 


An é-neighborhood consists, roughly, of all points near its center. This is such a 
useful property that we will often forgive a set for containing other points. This 
spirit of forgiveness leads us to the following definition. 


DEFINITION 4.23: The set U is a neighborhood of c if there exists ¢ > 0 so 
that U contains the e-neighborhood of c. 


Note that every e-neighborhood of a point is a neighborhood of it, but not every 
neighborhood is an ¢-neighborhood. We can use an é-neighborhood as an 
example in an existential proof, but in a universal proof we may not assume that 
a set identified only as a neighborhood is necessarily an e-neighborhood. 


EXAMPLES 4.10: 1. (0, 1) is a neighborhood of 1/4, but is not an e- 
neighborhood of 1/4. The 1/8-interval around 1/4 is contained in (0, 1). Note that 
any € < 1/4 will work in this argument, but since the definition of neighborhood 
is existential, we only need to find one such value. The interval (0, 1) is an e- 
neighborhood of 1/2, with ¢ = 1/2. 


2. The closed interval [0, 1] is also a neighborhood of 1/4 (by the same argument 
as above) but it is not a neighborhood of 0. Any e-neighborhood of 0 contains 
negative numbers (—é/2, for one) and so is not contained in [0, 1]. If a set is a 
neighborhood of a point, the point must be an element of the set, but a set can 
contain points of which it is not a neighborhood. 


3. The set (0,1)4(2,5,4,--.} is also a neighborhood of 1/4, but is not a 
neighborhood of any of 2, 3, 4, .... 


Example | is a special case of a very general result: 


THEOREM 4.24: An open interval is a neighborhood of each of its points. 


PROOF: Let © € (4,9) and ¢ = min{x — a, b — y}. Then (—£,2+¢) © (a, b) by 
Theorem 4.20. Since such an ¢ exists, (a, b) is a neighborhood of x. = 


EXERCISES 4.10 


1. Suppose U is a neighborhood of a point x and that “ C V. Show that V is a 
neighborhood of x. 


2. (a) Suppose U and V are neighborhoods of a point x. Show that UNV and 
U UV are neighborhoods of x. 


(b) Show that (a) remains true for a finite collection of neighborhoods. 


(c) Does (a) remain true for an infinite collection of neighborhoods? 
3. Show that any set that is a neighborhood of some point is uncountable. 


4. If x # y, show that there are neighborhoods U of x and V of y such that 
UNV =9. 


5. Draw a sketch to illustrate the proof of Theorem 4.24. 


' The symbol < means just what we think it does: a <bifa<bora=b. 
2 The least element of a set was defined in Exercise 4.1.1. 


3 The colon in this line only serves to separate the name of P(n) from the statement of P(n). One must 
be careful not to put an "=" here. 


4 These are called the Fibonacci numbers. The first few are 1, 1, 2, 3, 5, 8, 13, .... These numbers are 
of great interest in both serious and recreational number theory and pop up with eerie regularity in 
descriptions of natural processes. 


5 We usually assume that a < b unless we specifically state otherwise. However, if a = 6 then [a, b] = 
{a}, and if b <a we have (a, b) =", both of which make sense. This means that !! and a set consisting of a 
single point are intervals. 


6 We use a modification of Theorem 4.19 here. 


Part Two 


The Structure of the Real Number System 


We have found that the real numbers and the rational numbers have much in 
common in that they are both ordered fields. We turn now to the question that is 
the central theme of the book: How are these two fields different? We saw in 
Chapter 2 that a certain set of decimals has greater cardinality than the set of 
rational numbers. Perhaps this settles the issue. There would seem to be more 
real numbers than there are rational numbers. But how are those decimals related 
to the real numbers? This is a tough question (we will answer it in Chapter 6). 


The difference between the real and rational numbers is more than a mere 
counting problem. In this part of the book, we will be primarily concerned with 
six properties that the real numbers have but the rational numbers do not. This 
puts us in a peculiar position, which, though we should be aware of it, does not 
affect what we do. We will distinguish an ordered field having these properties 
from one not having them, and we will see that the rational numbers do not have 
these properties. But we will only postulate that there is an ordered field that 
does have them. Think of six descriptions of an imaginary beast. Any animal 
fitting one of your descriptions fits them all. A house cat, for instance, fits none 
of them. Even though you have clearly identified your animal as not a cat, it still 
does not exist! In the end, we can convince doubters by actually capturing one of 
our beasts. Western explorers of Australia vindicated themselves in just this way 
after years of having their tales of the platypus laughed at. We will find, though, 
that our study of the properties we will describe can proceed just as well whether 
or not we have a sample on hand (in this way, mathematics is not like biology). 
We will finally capture the real numbers in the last chapter of the book. 

The properties of the real numbers we will discuss are found in the following 
theorem. PLEASE REMAIN CALM! Theorem R, as we will call it, is the 
concern of all of Part 2 of the book. We shouldn't know what these statements 
mean yet! (Notice that Theorem R includes three "properties," two "theorems," 
one "criterion," and one statement without a fancy title. The differences among 
these names are historical accidents with no intellectual significance.) 


THEOREM R: /fF is an ordered field having the Least Upper Bound property, 
then F has the Archimedean property and the following results also hold in F. 


(The Least Upper Bound property is discussed in Chapter 5; the Archimedean 
property is discussed in Chapter 6.) 


(a) Every nest of closed, bounded intervals in F has a nonempty intersection. 
(This is called the Nested Intervals property—Chapter 6.) 


(b) Every bounded, infinite subset of F has a cluster point. (This is called the 
Bolzano-Weierstrass theorem—Chapter 7.) 


(c) A sequence in F converges to an element of F if and only if it is a Cauchy 
sequence. (This is called the Cauchy criterion—Chapter 10.) 


(d) A subset of F is compact if and only if it is closed and bounded. (This is 
called the Heine-Borel theorem—Chapter 11.) 


(e) F is connected. (Chapter 12) 


The relationships among the main parts of Theorem R are indicated in the 
following diagram. Each arrow represents one of our proofs (the Archimedean 
property is off to the side of this picture): 


F is 


Connected 


Bolzano- 
Weierstrass 
Theorem 


Cauchy 
Sequences 
Converge 


We denote the field in Theorem R by F because we are not certain that it is the 
real numbers as we usually think of them. The Nested Intervals property [part (a) 
of Theorem R] will tell us, among other things, that elements of such an ordered 


field have decimal expansions and that each decimal expansion corresponds to 
an element of the field. After proving this, we may confidently refer to the field 
as the real numbers. 


COMPLETENESS AND THE BIG THEOREM 


Theorem R does not tell the whole story of the real numbers. In the diagram 
above, it appears that everything grows from the Least Upper Bound property 
and that the Heine-Borel theorem, for instance, is not directly related to the 
Bolzano-Weierstrass theorem. But there is much more to the structure of the real 
numbers. The Least Upper Bound property and parts (a) through (e) of Theorem 
R are not just loosely related statements about the real numbers; they are 
equivalent,~ that is, they describe the same property of the real numbers. This 
property is called completeness, which is, in a word, the answer to the Big 
Question. We may define the real numbers to be a complete, Archimedean 
ordered field. (We saw in Exercise 4.6.8 that the rational numbers also have the 
Archimedean property; it is completeness that makes the real numbers special.) 
We refer to the following as the "Big Theorem" because it is the answer to 


the Big Question. The proof of the Big Theorem is the subject of most of Part 2 
of the book. 


THE BIG THEOREM: /fF is an ordered field, the following are equivalent: 
(a) F has 
the Least Upper Bound property. 
(b) F has the Archimedean property, and the Nested Intervals property. 
(c) F has the Archimedean property, and the Bolzano-Weierstrass theorem holds 
in F, 
(d) The Heine-Borel theorem holds in F. 
(e) F has the Archimedean property, and the Cauchy criterion holds in F. 
(f) F is connected. 
To show that these statements are equivalent, we must show that each of them 
implies each of the others. This would be thirty implications! Fortunately, we 
don't have to work quite that hard. A proof could be made with only six 
implications, like this: a >= b >c >d=>e = f= a. Our approach is not this 
economical, but if we can establish a chain of implications leading from each 
statement to each of the others, we will have shown that all are equivalent. Our 
proof of the Big Theorem is described in the following diagram, which we will 
call the Big Picture (as in "It is always important to see ... "): 


THE BIG PICTURE 


F is 
Connected 


' 
Least 
Upper Bound 


Pre yperty 


‘ 


Bolzano- Nested 
Weierstrass - Intervals 
J Theorem Property | ™ 
- \ 
Cauchy Heine- 
Sequences | Borel 


Converge Theorem 


Since they are equivalent, we may take any of the six main statements of 
Theorem R to be the definition of completeness. Some authors refer to the Least 
Upper Bound property in this way (as will we, most of the time), while others 
use the word in specific reference to the Cauchy criterion. There are good 
reasons for both choices, and in fact we will find just once (in Chapter 15) that 
the latter usage serves us better. 


The Big Theorem is important not so much because it asserts that each of the 
six characterizations of completeness is true (Theorem R tells us that), but 
because it asserts that they are equivalent. However, if one's goal is primarily to 
establish the validity of each of these statements, it is just as well to prove 
Theorem R, whose structure is not quite so elaborate. The proofs that transform 
Theorem R into the Big Theorem are contained in sections entitled "Closing the 
Loop." These sections may be considered optional. 


EXERCISES 


1. Show that "If we can establish a chain of implications leading from each 
statement to each of the others, all of them are equivalent" in the following 
way: 


(a) Show that ((A = B) and (B= C)) > (4>C). 


(b) Explain how this proves the statement. 


2 Their equivalence is conditioned on the somewhat mysterious fluttering about of the Archimedean 
property. How and why the Archimedean property enters this story is a study in itself. We will confine our 
efforts to making sure that the Archimedean property is actually used when we claim it is used in our 
proofs, and not used when we claim it is not. 


Chapter 5 


Upper Bounds and Suprema 


5.1 UPPER AND LOWER BOUNDS 


The ordering of our field allows us to define another important idea—that of an 
upper bound for a set. There are many possible relationships between a number 
and a set. The simplest of these is that a number might be an element of a set or 
it might not. The number 3 is not in the interval [0,1], but it is /arger than 
everything in the set, and this tells us something, too. Many arguments rest on 
estimates of the size of the elements of a set. 


DEFINITION 5.1: (a) The number zu is an upper bound for the set Sif s <u for 
each s € S. If Shas an upper bound, we say that it is bounded above. 


(b) The number w is a lower bound for the set S if s > w for each s € S. If Shas a 
lower bound, we say that it is bounded below. 


So 3 is an upper bound for [0, 1], as are 113, 3/2, 11/10, 1, and infinitely many 
others. We see immediately that if a set has any upper bound, it has infinitely 
many of them. Not every set has an upper bound. Suppose I think the number u 
is an upper bound for the whole real line. You need only point out that w+ 1 isa 
real number that is larger than u to show me that uw is not larger than every real 
number and so uw is not an upper bound for R. The set of natural numbers is 
bounded below, since n > 0 for all n € N. The previous argument can be used to 
show that no natural number is an upper bound for N. This does not exclude the 
possibility that there is some real number that is not a natural number but is an 
upper bound for N (be sure you see why this is so). We will show that N is not 
bounded above in the next chapter, but note how much machinery is necessary to 
accomplish this. 


Among the upper bounds of the set [0, 1], the number | is special. It is the 
only one for which there is no smaller upper bound. This seems obvious, but let's 
check it carefully. Suppose u > 1. By Theorem 4.12, 1 < (1 + u)/2 <u, and so (1 
+ u)/2 is an upper bound for [0, 1]. But (1 + u)/2 <u, and so there is an upper 
bound for [0, 1] that is less than u. Such a u does not have the special property 
we're looking for. On the other hand, if uw < 1, then u can't be an upper bound for 


[0, 1] because ! € [9,1] and so u is less than an element of [0, 1]. This special 
upper bound, and the analogous special lower bound, are given names in the 
following definition. 


DEFINITION 5.2: (a) The number uw is the supremum (or least upper bound) 
of the set S if 
(1) wis an upper bound for S 
and (11) there is no upper bound for S less than w. 


(b) The number w is the infimum (or greatest lower bound) of S if 


(1) w is a lower bound for S 
and (11) there is no lower bound for S greater than w. 


We will denote the supremum of a set S by sup S (if we say "least upper bound" 
we write lub S), and the infimum by inf S (or glb S). 


EXAMPLES 5.1: 1. We have seen that 1 = sup[0, 1]. 


2. It is also true that 1 = sup(0, 1). Clearly 1 is an upper bound for this set. We 
must show that no smaller number is an upper bound, that is, for a given v < 1 
we must find ¢ © (9,1) with t > v. Since ! € (0,1), the previous argument doesn't 
work. Note that v < (v + 1)/2 < 1. Is (v+ 1)/2 the number we're looking for? It is 
true that (v + 1)/2 is greater than v, but it is not necessarily in (0, 1). If v= —11, 
then (v + 1)/2 = —5, which is not in (0, 1). On the other hand, if v = —11, the 
number 1/2 will serve our purpose. Let 


v+1 


ifuv <0. 


Nile bo 


Then ¢ > v and ¢ € (0,1), and so v is not an upper bound for (0, 1). It follows that 
1 =sup(0, 1). 


This example also demonstrates the important fact that the supremum of a set 
need not be in the set. If we happen to know that the supremum of a set is an 
element of the set, we will call it the maximum and write max S instead of sup 
S. When we refer to a number as the maximum of a set, we are making two 
claims that must both be verified: (1) That the number is the supremum of the 
set; and (2) that it is an element of the set. If we are certain that the infimum of a 
set is an element of the set, we call it the minimum and write min S instead of 


inf S. It is true that 1 = max[0, 1], but not that 1 = max(0, 1). "Maximum" and 


"minimum" are innocent-sounding words that must be used with caution. 
1 


3. Let H = {1.3.5---}, By Corollary 4.11 and Theorem 4.13, each element of H 
is positive, and so 0 is a lower bound for H. To show that 0 is the infimum of H, 
we must establish that there is no lower bound for H greater than 0. That is, we 


must show that ¥ > 9 = 4n¢€Na(1/n<v). This appears to be so, but we can't 
prove it yet! We will prove it in Chapter 6. 
4. The set {7 © Q: r > Oandr* < 2} thought of as a subset of Q, has no rational 


supremum even though it is bounded above. This will be shown in detail in the 
proof of Theorem 5.4. 

The phrase "uw is an upper bound for S" may be written Ys © 5(u = 5), Thus the 
statement "v is not an upper bound for S" may be written: 38 © 53( < 5), We may 
restate the definition of supremum: 


u=supS + (Vs € S(s < u)) and (vu < u=> ds € S3(v < s)) 


(and a similar statement for the infimum). This makes the last part of the 
proof of the following theorem immediate. It will be helpful to have a variety of 
means by which to check that a number is the supremum of a set. 


THEOREM 5.3: u is the supremum of S if and only if, for any ¢ > 0, it is both 
the case that there is no element of S greater than u + «€ and that there is an 
element of S greater than u — €. 


PROOF: That u = sup S implies the other conditions is clear. Suppose u satisfies 
the last two conditions of the theorem. We show first that such a uw is an upper 
bound for S. Suppose u < x. Then (x — u)/2 > 0, and, by Theorem 4.12, u < (u + 
x)/2 =u +(x — u)/2 <x. Taking ¢ = (x — u)/2 in the hypothesis, we see that  ¢ 5. 
Thus wu is an upper bound for S. The other part of the proof follows from the 
comment above. = 


EXERCISES 5.1 


1. Show that if a set has one upper bound, it has infinitely many. 
2. Show that a set can have only one supremum and one infimum. 


3. Show that if a set contains one of its upper (respectively, lower) bounds, then 
that bound is the supremum (respectively, infimum) of the set. 


4. (a) If S#0, show that inf S < sup S. 
(b) What can be said about the set S if inf S = sup S? 


(c) What are the supremum and infimum of !!? 


5. If Sis anonempty set, u is an upper bound for S, and v is not an upper bound 
for S, show that there is an element s € S withy < s <u. 


6. (a) If 5 S 7, show that inf 7 < inf S and sup S < sup T. 


(b) Show that it is possible for the set containment in (a) to be proper without 
either of the inequalities being strict. 


7. Show that a finite set contains its supremum and infimum. 


8. (a) If all the elements of a set S are positive, show that inf S > 0. 


(b) Show that a finite set whose elements are all positive has a positive 
infimum (remember, 0 is not positive). 


(c) Show that (b) does not hold if the set is infinite. 


(d) Show that if a set of positive numbers has infimum 0, the set must be 
infinite. 


9. Suppose that sup(AU 8) = u and that there is an ¢ > 0 so that a < u — «é for 
all a € A. Show that sup(A U B) = sup B. 


10. (a) Show that if w and S are as in Theorem 5.3 and u = sup S, then the other 
two conditions in the theorem are met. 


(b) Draw a picture to illustrate the proof of Theorem 5.3. 


11. (a) Let S and T be sets with the property that s < ¢ for each s € S and each 
t € T. Show that sup S < inf 7. 


(b) If S and 7 are as in (a), show that sup S = inf T if and only if, for any ¢ > 
0, there are elements s € S andt € T with t — s < e. (Note that t -— s > 0.) 


(c) Let S and T be sets with the property that, for each s € S, there is ate T 
with s < ¢ and for each t € T there is ans € S with s < ¢. Show that inf S < inf 
T and sup S < sup T. 


(d) Can these be replaced with strict inequalities? 


(e) Give a condition that would guarantee strict inequality in (a). 


12. Define a property LR (for "left ray") as follows: The set 4 has LR if «© A 
and y < x imply y © 4, 


(a) Is a set having LR necessarily nonempty? 

(b) If A has LR, « € A and 2 ¢ A, thenz > x. 

(c) Does the whole number line have LR? 

(d) If A and B have LR, either 4& Por BCA, 

(e) Show that if A #0, A has LR, and c = sup A, then either 


A=f{z:z<c} or Az={z:z<ct},. 


13. Discuss the relationship between our inability to complete Example 5.1.3 
and our inability to show that the natural numbers are bounded above. 


5.2 THE LEAST UPPER BOUND AXIOM 


In Example 5.1.4 we saw a set of rational numbers that is bounded above but has 
no supremum. This is inconvenient. In the following axiom, we assert that it 
can't happen for a set of real numbers. 


THE LEAST UPPER BOUND AXIOM: Every nonempty subset of the real 
numbers that is bounded above has a least upper bound that is a real number. 


This is not true if "real" is replaced by "rational," as Example 5.1.4 
demonstrates. We have found what we were looking for: a property that 
distinguishes the real numbers from the rational numbers. While it must remain 
an axiom for most of the book, the Least Upper Bound axiom will become a 
theorem in Chapter 22. We will say that an ordered field in which the Least 
Upper Bound axiom holds "has the Least Upper Bound property." Our first use 
of the Least Upper Bound axiom will be to complete Example 5.1.4. 


THEOREM 5.4: (a) There is no rational number whose square is 2. (b) Any 
ordered field having the Least Upper Bound property has a positive element 
whose square is 2. 


PROOF: (a) Suppose (p/g)* = 2, where p and gq are integers. Then p? = 2q7. 
Now p and g can be factored into prime numbers. The factor 2 may or may not 
appear in the prime factorization of p, but it must appear an even number of 
times (possibly none) in the factorization of p*. Likewise, 2 must appear an even 


number of times in the factorization of g?, and so 2 appears an odd number of 
times in the factorization of 2g*. The number of factors of 2 in a natural number 


can't be both even and odd, and so the assumption that (p/q)* = 2 has led to a 
contradiction. 


(b) This is more complicated. Let S = {2 © F : © > O and x* <2}. We show first 
that S has a supremum. Since | € S, we have S # 0. If x > 2, then x* > 4 (by 
Theorem 4.12). Such a number is not an element of S. Thus 2 is an upper bound 
for S, and S' is bounded above. Since F has the Least Upper Bound property, S 
has a supremum. The next step is to show that (sup S)* > 2. Our proof will go 
like this: We will show that, for any y © F with y? < 2, there is a positive element 
of F, say z, with (y + z)* <2. This says y + = © 5, and since y < y + z, we see that 
such a y can't be an upper bound for S. It follows that y 4 sup S and consequently 
that (sup S)* « 2. A similar argument (which you will provide in Exercise 5.2.2) 
shows that (sup S)* + 2. Combining the two inequalities, we have (sup S)? = 2. 


Now we will do the work. We indulge in a little "backward" thinking: We 
want (y+z)* = * + 2yz + z* < 2. This is the same as z(2y + z) < 2-)*, and we 
know that 2 — y* is positive. How big is z(2y + z)? Since y © 5, we have y < 2. 
We must also have y + z < 2, and since y > 0, we may assume z < 2 (remember, 
we're working backward). Then 2y + z < 6, and z(2y + z) < 6z. If we can make 
6z < 2 — y*, we will be done. We can do this by letting z = (2 — y*)/7. (The 
calculation should now be rewritten in the proper order.) = 

The set {" © Q: r > Vand ?* < 2}, thought of as a subset of Q, is bounded 
above but has no supremum. When thought of as a subset of R, though, the 
supremum of this set is a positive real number whose square is 2, which we can 
call V2. 


EXERCISES 5.2 


1. Show that the natural numbers have the Least Upper Bound property. 
2. Complete the proof of Theorem 5.4. 


3. (a) If S and 7 are bounded sets, show that S % T and S U T are bounded. 


(b) If S and 7 are as in (a), show that SUp(SUT) = max{sup S,supT} (be sure to 
justify use of "max"). 


(c) Is it true that sUp(S OT) = min{sup S, supT}9 


(d) Give a condition under which the equality in (c) would be true. 


(e) Let {Sa : @ © A} be a collection of bounded sets (where A is finite). Show 
that Uses So is bounded. 
(f) Let {Sa : @ © A} be a collection of bounded sets (where A is infinite). Is 


Uae Se necessarily bounded? 


. (a) Does the collection of sets with the ordering given in Example 4.1.3 have 
the Least Upper Bound property? 


(b) What if the underlying set (the one whose subsets are ordered in this way) 
is infinite? 


. There is an element whose square is 2 in R but not in Q. In Z; we have 0? = 
0 and 1* = 2? = 1, so there is no element in Z, whose square is 2. Are there 
any values of p for which Z, has an element whose square is 2? Can you 
characterize those numbers p for which Z, has an element whose square is 2? 


. For any set S, let —5 = {©:*=~* for some s € S}. If S is bounded below, 
show that —S' is bounded above and that sup(—S) = — inf S (this allows us to 
use theorems about suprema to say things about infima). 


. Here are two more ways to prove that v 2 is irrational: 


(a) (i) Show that if n is a natural number and 7? is even, then n is even. 


(ii) Assume that p* = 2q? and that p and qg have no common factors. Derive a 
contradiction. 


(b) (i) Show that if x is a positive number with x? = 2, then 1 <x <2. 

(ii) Let S = {n © N: nv2 EN}, If v2 is rational, we have S # 0. Let g be the 
least element of S. Examine 9V2 — 4 and derive a contradiction. 

(c) One's preference of a proof of the irrationality of V2 depends in part on 
which number-theoretic results! one is willing to consider most evident. 


Review the three proofs of the irrationality of V2 and pick out the number- 
theoretic results that are needed to make each of them work. 


. The following "proof" contains at least two serious errors. Draw a picture to 
illustrate what the author of this "proof" thought they were doing. Find the 


errors and explain why they are errors. Finally, give an example to show that 
the result is false. 


"THEOREM": Every nonempty set is a neighborhood of at least one of 
its points. 


"PROOF": Let 4 be a nonempty set. Let u = sup A. Since u = sup A, there 
is a number ¢ > 0 so that (¥ ~ ©") © A, Let x = u — é/2. Now the interval 
(x — &/4, x + e/4) 1s contained in the interval (u — ¢, u), and so (x — é/4, x 
+ ¢/4) is contained in A. Thus A is a neighborhood of x. 


9. (a) If S is a nonempty bounded set, show that ° & =f S.sup 5), 
(b) If S is as in (a) and J is a closed interval with 5 /, show that 
inf S,sup S$] C I 


(c) If S is as in (a), show that |imf S,sup S| =, where the intersection is taken 
over all closed intervals containing S. 


(d) What is [\/, where the intersection is taken over all open intervals 
containing S? 
10. (a) Let S be a nonempty set that is bounded above, and let 


T= {x :x 1s an upper bound for S}. 


Show that 7’ is nonempty and bounded below and that sup S = inf 7. 
(b)Let S be a nonempty set that is bounded below, and let 


T= {x : x 1s a lower bound for S}. 


Show that 7’ is nonempty and bounded above and that sup 7’ = inf S. 


(c) Use (b) to establish a Greatest Lower Bound axiom. (Since you will be 
proving it, this will be a Greatest Lower Bound theorem.) 


11. (a) Modify the proof of Theorem 5.4 to show that 
(1) there is no rational number whose square is 3, and 
(11) in any ordered field in which the Least Upper Bound axiom holds, 
there is a positive element whose square is 3. 


(b) Where does the proof of the first part of Theorem 5.4 fail if one tries to 
use the same method to show that there is no rational number whose square is 


4” 


12. (a) Define the sum of two sets A and B to be 


A+B={z:z=2+y for some zr € A and ye B}. 


If A and B are bounded above, show that A + B is bounded above and that 
sup(4 + B) = sup A + sup B. 


(b) Define a "multiple" of a set to be 


kA ={z:2z= kz for some z € A}. 


If k > 0 and A is bounded above, show that kA is bounded above and sup kA 
= k(sup A). What happens if k < 0? 


(c) We have definitions of "addition" and "scalar multiplication" for sets. Is 
the collection of subsets of the real line a vector space under this structure? 


(d) Along these same lines, we could define the "product" of two sets to be 
AB = {z:z =xy for some x € A and ¥ © ®} Show that it is not true in general 
that sup AB = (sup A) (sup B). Is this ever the case? 


13. Let Lx] denote the greatest integer less than or equal to x (so that, for 
instance, lz] = 3, |-5.79] =—6, and [4] = 4). 


(a) Show that a number x is an integer if and only if Lx] = x. 


(b) Show that x is rational if and only if there is a natural number n so that [ 
nx] = nx. 


(c) Recall from calculus that © = De=0 Hl. For any natural number n, show 


that [n!e] = n! op» i, 


(d) Observe that "! Yk-o H < ™e for all n. Show that e is irrational. 
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14. (a) Examine the numbers V2°~ and 1 to show that it is possible for 
an irrational number raised to an irrational power to be rational. (Is the first 
one rational? If not, is the second one rational?) 


(b) Consider other combinations of bases and exponents. For instance, is it 
possible for an irrational number raised to a rational power to be rational? 
Consider what might happen if the base (or power) is an integer. 


(c) Think about whether you have really proved the result in (a). It seems that 
one of the two numbers given must be an example of the phenomenon, but 
which one is it? Have you really given an example? (This is an exercise in 
intuitionist thinking. See Exercise 1.3.1.) 


(d) Upon begin posed the question in (a)—Can an irrational number raised to 


an irrational power be rational?—many people respond "Just look at e!"*." Is 
this answer any better (or worse) than the one we've given here? 


! "Number-theoretic results" include, among other things, statements concerning the arithmetic of the 
natural numbers, which natural numbers divide evenly into which others, and the ways numbers can be 
factored. 
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6.1 THE INTEGER PART OF A NUMBER—THE 
ARCHIMEDEAN PROPERTY 


In this chapter we will finally establish the association between elements of an 
ordered field and decimal expansions. Two theorems (both of which appear at 
first to be about other things altogether!) will lead us to this reassuring result. 
The first of these theorems lets us find the integer part of an element of our field. 
For the moment, we will consider only positive elements of the ordered field. If 
« € Pandn is a natural number with * © (” ~ 1.) we call n — 1 the integer part! 
of x. We will show that such a natural number exists by attacking a similar but 
smaller problem. Instead of finding what seems to be the smallest natural 
number larger than x, we will show first that x is smaller than some natural 
number. This is not as obvious as it might seem. 


Recall the field of formal rational functions in Chapter 3. We may make this 
an ordered field by saying that p(x)/q(x) 1s positive if the coefficients of the 
highest-order terms of p(x) and g(x) have the same sign. Now x/1 is a formal 
rational function, as is n/1 (which corresponds to the natural number nv in this 
field). Note that (x/1)—(n/1) = (x—n)/1 is positive no matter what n is. In other 
words, x/1 1s larger than all the natural numbers in this field. Recall that there is 
no such element in the rational numbers (Exercise 4.6.8). While the structure of 
the field of formal rational functions remains somewhat of a mystery, we now 
know that this field is definitely different from the rational numbers. 


Theorem 6.1 tells us that, in an ordered field with the Least Upper Bound 
property, there is no element larger than all the natural numbers. The conclusion 
of Theorem 6.1 is called the Archimedean property. A field for which it holds 
is said to "have the Archimedean property" or simply to "be Archimedean." 


THEOREM 6.1: Let (F, P) be an ordered field having the Least Upper Bound 
property, and let x be any element of ¥. Then there is a natural number n,, with n,, 
> X. 


PROOF: This statement is the same as saying N © F is not bounded above. 


Suppose N is bounded above. Then, by the Least Upper Bound property,* N 
must have a supremum, call it uv. Taking ¢ = 1 in Theorem 5.3, there is a natural 
number ny with ng > u — 1. Now ny + | is also a natural number, and u < ng + 1. 


This contradicts the assertion that u = sup N (because uw is not an upper bound for 
N) and thus contradicts the assumption that N is bounded above. m= 


The Archimedean property has several important corollaries: 


COROLLARY 6.2: /f (F, P) is an ordered field having the Least Upper Bound 
property, then 


(a) If x € P, there is a natural number n,. with n, — 1 < x <n, (this allows us to 
define the integer part of a positive number). 


(b) If x © P, there is a natural number n,. with 1/n,. < x (this allows us to complete 
Example 5.1.3). 


(c) For any *.¥ © P, there is a natural number n so that nx > y. 


(d) Every nonempty open interval in F contains both a rational element of F and 
an irrational element of F. 


PROOF: The proofs of parts (b) and (c) are left as Exercise 6.1.2. (a) Let « € P. 
By the Archimedean property, 2 = {* © N: = x} #0 According to the well- 


ordering property, B has a least element. Call it n,. Then n, = x because "x © B, 
and n, — 1 <x because "* — 1 ¢ B, 


(d) Let the interval be (a, b). Since b — a > 0, part (c) of the Corollary says there 
is a natural number n with n(b — a) > 1. By Exercise 4.9.3, there is a natural 


number m in (na, nb). Then na < m < nb, and, since n > 0, we may divide to 
obtain a < m/n < b. This is the rational element we were after. Now let p be a 
rational element with a < p < b, anda rational element with a < p<r< b. 
Then (1/V2)p + (1 — 1/v2)r is an irrational element of the type we were looking 
for (you will check this in Exercise 6.1.3). = 


A subset of the real numbers is called dense in the real line (or just dense) if its 
intersection with any nonempty open interval is nonempty. Corollary 6.2.d says 
that the rational numbers and the irrational numbers are both dense in the real 
line. It is often called the Density theorem for this reason. We may sharpen the 
second part of it in this way: 


THEOREM 6.3: /f F is an Archimedean ordered field having an irrational 
element z, and a < b, there is a rational element r with "2 © (@,5), 


PROOF: Left as Exercise 6.1.5. = 


Theorem 6.3 is quite relevant to us, since we are only certain that one number 
(V2) is irrational. We will find that having a specific example of an object is often 
more useful than knowing that lots of them exist. On the other hand, since, as we 
will see shortly, there are uncountably many irrational numbers, there must be at 
least one of them. There must be irrational numbers because there aren't enough 
rational numbers to account for all the real numbers. This counting technique to 
prove that something exists is very handy. We used it in Exercise 2.2.5 to show 
that there must be transcendental numbers, thereby saving us the trouble of 
actually having to find one. 


EXERCISES 6.1 


1. (a) Verify that the formal rational functions are an ordered field with the 
positive set given. 


(b) Show that (2/1) ~ (n/1) © P for all n. 


2. (a) Prove Corollaries 6.2.b and 6.2.c. 
(b) Modify Corollary 6.2.a to include negative numbers. 


(c) If a < b, show that (#9) Q and (4,6) R\Q are both infinite. (Corollary 
6.2.d says they are nonempty.) 


(d) If D is any dense subset of the real numbers and a < b, show that ? © (4,5) 


is infinite. (Again, the definition of dense only stipulates that this intersection 
is nonempty.) 


(ec) Show that no finite set can be dense in the real line. 


. (a) If0<t< 1 anda<b, show thata<ta+(1—-db<b. 
(b) Complete the proof of Corollary 6.2.d. 


. Use Exercise 4.10.3 to prove the Density Theorem (as it relates to irrational 
numbers). 


. Prove Theorem 6.3. 


. (a) Suppose x; is a real number for k = 1, 2, ..., and that there is a positive 
number ¢€ so that x, > ¢ for each k. If B is any real number, show that there is 
a natural number n so that x, +x, +...+x,>B. 


(b) Show that this need not be the case if we assume only that x, > 0 for all n. 


. Use Bernoulli's inequality (Exercise 4.5.19) to show that if x > 1, the set of 
numbers {x”} is unbounded. 


_ (a) Show in detail that inf {2=} = 9. 
(b) Why is this exercise here rather than in Chapter 5? 


. (a) If T is a linearly ordered set and 5S © 7, we say that S and T are 
coterminal if for each ¢t € T there is an s € S with s > t, and vice versa. If T 
and S are coterminal ordered fields, show that T is Archimedean if and only 


if S is Archimedean. 


(b) Could we use this result, along with Exercise 4.6.8, to show that the real 
number line is Archimedean?. 


10. Suppose S and U are sets with the following two properties: 


(1) Each element of U is an upper bound for S. 
(11) For any n € N, there ares € Sandu € U with u — s < 1/n. 
(a) Show that each element of S is a lower bound for U. 
(b) Show that sup S = inf U. 
(c) If uw is an upper bound for a set X with the property that for any n € N 


there is an x © X with u — x < I/n, then u = sup_X. 


(d) Show that u is an upper bound for a set X if and only if uv + 1/n is an upper 
bound for X for all n. 


(e) Combine (c) and (d) to show: u = sup X if and only if, for each natural 
number n, u + 1/n is an upper bound for_X and u — 1/n is not. 


(f) This question is about suprema. Why is it in this chapter instead of in the 
previous one? 


11. (a) Show that an element of the positive set in the field of formal rational 
functions, considered now as a genuine function, does not necessarily take 
on only positive values. Can we say anything about the values of a "positive" 
formal rational function? 


(b) Define a subset P of the field of formal rational functions by 
P={f: f(z) >0,Vz € R} 
Is this a positive set? 


12. (a) If D is dense in the the real line and P © 5, show that S is dense in the 
real line. 


(b) Show that if S' is dense in the real line and a finite number of points are 
removed from S, the resulting set is also dense in the real line. 


(c) Does (b) necessarily remain true if the set that is removed is infinite? 


(d) Show that every dense set has a proper subset that is also dense. 


13. The dyadic rationals are rational numbers of the form n/2” for some 
integer n and some natural number m. 


(a) Show that not every rational number is a dyadic rational. 
(b) Show that the set of dyadic rationals is countable. 
(c) Show that the set of dyadic rationals is dense in the real line. 


(d) Show that all of this is true if the denominators of the fractions are 
powers of any natural number greater than | (of course, such things aren't the 
dyadic rationals anymore). 


14. (a) Show that the set of natural numbers does not have a supremum in any 
ordered field (Archimedean or not). 


(b) Let F be a non-Archimedean ordered field, and let 
U = {x: 2 is an upper bound for N} 


Show that (1) U# 0), and (i1) U is bounded below but has no infimum. 
(c) Doesn't part (a) say that every ordered field is Archimedean? 


6.2 NESTS 


We will now find the fractional part of an element of our field. As before, the 
beginning of this journey may not immediately seem related to its end. 


DEFINITION 6.4: A collection of sets, S,, $5, ..., is a mest if Sn 2 Sn+1 for n = 
[ ever 


We are interested only in nests of intervals of real or rational numbers, say {[a,,, 
b,|}, which we may visualize like this: 


— on oon oe —— 


a, ao a3 ag°-- ‘++ by bg be dy 


It need not happen (as it does in this diagram) that a, <a,,, and b, > 6,., for all 
n. Consecutive endpoints (or consecutive sets for that matter) might be the same. 


THEOREM 6.5: (The Nested Intervals Property) Jf F is an ordered field having 
the Least Upper Bound property, then 


(a) Any nest {I,} of nonempty, closed, bounded intervals has a nonempty 
intersection (that is, \n In # 9), 

(b) If the infimum of the lengths of the intervals I, is 0, there is an element of ¥, 
say x, so that \n !n = {7}, 


PROOF: (a) Let J,, = [a,, b,,]. Since /n > /n+1, we have a, <a) <... and b, > by 
>... (by Exercise 4.9.4). We also have a, < by, ay < by, .... We begin by showing 
that a; < 5; for every i and j (note that this is not part of the "given"). There are 
three cases to consider, depending on whether 7 = /j, i <j, ori > j. If i =/, it is 
given that a; < b,. Ifi <j, note that a; < a; < b,. Finally, if7 > 7, then a; < b; < 5; (it 
is easy to see what has happened here by looking at 7 and j equal to 2 or 3 in the 
diagram above). 


Let A = {a, dy, ...}. Then 5, is an upper bound for A (as is each 5,). By the 


Least Upper Bound property, A has a supremum. Call it x. By the definition of 
supremum, x > a, for each n and, since each b,, is an upper bound for A, x < 5, 


for each n (be sure you see why this is so). So a, <x <b,. That is, © © Jn for each 
n. It follows that £ © Mn /n, and so this intersection is not empty. 

(b) Suppose x and y are elements of Mn /». Then x and y are in I, for each n, and 
so |x — y| <b, — a, for each n (Theorem 4.19). Since inf{b, — a,} = 0, and |x — y| 
> (0), it can only be that |x — y| = 0, that is, x=. = 


EXERCISES 6.2 
1. In the notation of the proof of Theorem 6.5, show that 
Q),, [an; bn] = [sup{a, }, inf {b, }}. 


2. Give an example of a nest of bounded intervals whose intersection is empty 
and of a nest of closed intervals whose intersection is empty. 


3. Can the Nested Intervals property be weakened to allow intervals that are not 
closed? For instance, does the result remain true if the intervals are required 
to be of the form [a, b)? How about (a, b]? What if some of the intervals are 
of the form [a, b) and some are of the form (a, 5]? 


4. (a) In the field of formal rational functions, construct a nest of closed, 
bounded intervals whose intersection is empty. (That is, show that the Nested 
Intervals property fails in this field.) 


(b) Discuss whether the Nested Intervals property must fail in any non- 
Archimedean ordered field. 


6.3 THE FRACTIONAL PART OF A NUMBER—DECIMAL 
EXPANSIONS 


We can now complete the association between elements of a complete ordered 
field and decimal expansions. Having done this, we may feel secure in referring 
to this field as the real numbers. If & is the integer part of the positive number x 
(provided by Corollary 6.2.a), then *~* © (9.1), We will devise a procedure for 
associating each element of the interval (0,1] with a decimal expansion, and vice 
versa. 


THEOREM 6.6: /n an Archimedean ordered field in which the Nested Intervals 
property holds, there is a one-to-one correspondence between the interval I = (0, 
1] and the nonterminating decimal expansions of the form 0.dd>d, .... 


PROOF: Let « € J. We will describe a decimal expansion of the proper form (if 
you fuss over all the details of this argument, you will see a good illustration that 
clarity and precision do not always go hand in hand). 


{4 4} 4 


21 22 d 24 25 2 27 28 29 


zi 22 . j 2 3 
0 100 100 100 100 100 160 100 100 100 10 


hs 


Divide J into 10 disjoint subintervals of the form (a, b], each of length 1/10. 
Since these intervals are disjoint and their union is J, one of them (and only one 
of them) must contain x. If x is in the Ath interval from the left, set d} =k — 1 For 


instance, d, = 2 if x is in (2/10, 3/10]. Now divide the interval just selected into 
10 disjoint subintervals of length 1/100 and choose d, in the same way. This is 


illustrated in the diagram above. Continue in this way, dividing the current 
interval into 10 parts and assigning the next decimal place. We have associated x 
with a decimal expansion. That this association is one-to-one is seen as in the 
proof of Theorem 6.5.b. 


We've done only half the proof (and we haven't used the Nested Intervals 
property yet!) We have to show that this association is onto. We now show that 
every decimal expansion corresponds to an element of / (that is, that this 
correspondence between numbers and decimals is onto). This is quite simple, 
since the process we've just described can be reversed. Just use the digits to 
select the intervals instead of the other way around. We must replace half-open 
intervals with the associated closed ones in order to use the Nested Intervals 
property. Observe that the infimum of the lengths of the intervals in the resulting 
nest is 0. By the Nested Intervals property, there is a number x that is the only 
element of the intersection of this nest. This is the number to which the decimal 
corresponds. = 


If k is the integer part of x > 0, and 0.d,d>d; ..., is the decimal expansion of x — 
k, found as above, then the decimal expansion of x 1s k.d,d5d3 .... 


COROLLARY 6.7: There are uncountably many real numbers. There are 
uncountably many irrational numbers. 


PROOF: The first statement follows from Exercise 2.3.6, Cantor 
diagonalization, and Theorem 6.6. Note that R = QU R\Q_ Tf the set of irrational 
numbers were countable, we would have written R as a union of two countable 
sets and R would be countable. = 


Corollary 6.7 answers the Big Question to the extent that we now know that the 
real numbers and the rational numbers are indeed different. It might be a bit 
unsatisfying that this answer does not refer to any aspect of the sets other than 
cardinality. Here is an example that involves the order structure: 


EXAMPLES 6.3: 1. By Theorem 5.4, we know there is an irrational number 
whose square is 2. By Theorem 6.6, this number has a decimal expansion (which 
we suspect begins 1.414...). Consider the intervals Jp = [1, 2], J; = [1.4, 1.5], J 


=[1.41, 1.42], .... (If the truncation of the decimal expansion of V2 to n places is 
ry» let Jy, = [1qs 7, + 10-"].) Note that V2 € Jn for all n (this is how the decimal 


places were chosen). Let /n = Jn  Q. Since the endpoints of each J,, are rational, 
each J, contains its endpoints. Thus each J, is a closed, bounded interval of 
rational numbers. But Mn!» = (An Jn) 1Q = {V2} Q = 0. Thus {I,} is a nest 
of closed, bounded intervals of rational numbers whose intersection is empty. We 
see that Q doesn't have the Nested Intervals property. 


EXERCISES 6.3 


1. (a) By splitting the intervals in halves instead of tenths, modify Theorem 6.6 
to show that every element of a complete ordered field has a binary (base 2) 
expansion. 


(b) Show that every element of a complete ordered field has a ternary (base 
3) expansion. 


(c) Would the process described in Theorem 6.6 still work if one used 
different numbers of intervals at each stage? Is there any reason to do this? 


2. (a) What decimal expansion does Theorem 6.6 assign the number 1? 


(b) Show that Theorem 6.6 doesn't assign any number a terminating decimal 
expansion. 


(c) Describe the numbers that Theorem 6.6 assigns decimal expansions 
ending in an infinite string of 9s. 


(d) Show that the Theorem 6.6, as modified in Exercise 1, doesn't assign any 
number a terminating expansion. 


(e) Describe the numbers that the procedure in Exercise | assigns binary 
expansions ending in an infinite string of Is. Describe the numbers that the 
procedure in Exercise | assigns ternary expansions ending in an infinite 
string of 2s. 


(f) Suppose we begin by dividing the intervals into b equal parts. Repeat this 
exercise for such a procedure. (Note then that every element of a complete 
ordered field has an expansion in any number base.) 


3. Modify the discussion in the chapter to show that negative numbers have 
decimal expansions. 


4. (a) Suppose we've selected a finite portion of the decimal expansion for x as 
in Theorem 6.6: 0.d; d, ... d,_; d,,. Show that 


dy dy , d,, - = d, dy d, +1 
10 100 . y= 40" 108 10” 


(b) Show that the decimal expansion of V2 begins 1.414.... 


5. How is the Archimedean property used in the proof of Theorem 6.6? 


6. The intervals J, in Example 6.3.1 were chosen to make Mn J» = {V2}, Why 
couldn't we have used the intervals [V2 ~1/n,v2+1/n| and get the same 
result? 


7. (a) Show that any repeating decimal represents a rational number and that 
every rational number is represented by a repeating decimal. 


(b) Show that there are countably many repeating decimals. 
(c) Show that there are uncountably many nonrepeating decimals. 


(d) Why does this exercise appear here rather than in Chapter 2? 


8. Comment on the following as a definition of addition for real numbers: 
Suppose the integer parts of x and y are m and n and the fractional parts of x 
and y are associated by Theorem 6.6 with nests {[a,, b,]} and {[c,, d,]}. 
Note that a,, b,,, c,, and d,, are rational (so we know how to add them). Let z 
be the unique element of { \n\@n+€n:0n+4n), If z < 1, let x + y have integer part 
m +n and fractional part z. If z > 1, let x + y have integer part m + n+ 1 and 
fractional part the same as the fractional part of z. Ifz=1, letx +y=m+n 
+ 1. (You should begin by examining closely the sentence in italics.) 


9. Prove Theorem 6.1 again by showing that no element of an ordered field can 
possibly satisfy the conditions of Theorem 5.3 and consequently that no 
element of such a field can be the supremum of N. 


lin keeping with our earlier stipulation that there should be no terminating decimals, the integer part of 
the natural number m is m — 1 (so, for instance, the decimal expansion of the natural number 23 will turn 
out to be 22.999...). 


2 Notice that the assumption that N is bounded above gives us more information than we had before, 
since it allows us to use the Least Upper Bound property. This is a good proof to do by contradiction. 
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7.1 POINTS AND SETS 


A real number and a set of real numbers might be related in several ways. The 
point might be an element of the set or it might not, but this is just the simplest 
case. Our work has shown us other possibilities: 


(1) The set 
might be a neighborhood of the point. 


(2) The point might be an upper or lower bound for the set. 
(3) The point might be the supremum or infimum of the set. 


(4) The point might be related to the complement of the set in one of these 
ways. 


These statements are not unrelated: The supremum of a set is an upper bound of 
the set; a set contains any point of which it is a neighborhood; and so on. On the 
other hand, a set can't be a neighborhood of its supremum (be sure you see why 
this is so). Something that can't happen is often as interesting as something that 
can (or must). We will look at this list from two points of view and from 
different beginnings arrive at much the same end. 


EXERCISES 7.1 


1. Show that a set can't be a neighborhood of its supremum. 


2. (a) Show that a closed, bounded interval contains is supremum and infimum 
and that an open interval contains neither. 


(b) Give examples of (i) a set that is not a closed, bounded interval, but 
nevertheless contains its supremum and infimum, and (11) a set that is not an 
open interval but nevertheless does not contain it supremum or infimum. 


7.2 ONE POINT OF VIEW 


Loosely speaking, a set is a neighborhood of a point if the set contains 
everything "near" the point. Can we weaken this requirement? Take a set S and a 
point s and consider the following statements, the first being the definition of 
neighborhood: 


(1) There is an e > 0 so that S contains all points of (s — €, s + €). 

(2) There is an ¢ > 0 so that S contains all but finitely many points of (s — ¢, 
S Fe). 

(3) There is an ¢ > 0 so that S contains infinitely many points of (s — €, s + 
é). 

(4) There is an e > 0 so that S contains a point other than s of (s — €, s + €). 
(5) For every ¢ > 0, S contains infinitely many points of (s — €, 5 + €). 

(6) For every ¢ > 0, S contains a point other than s of (s — €, s + €). 


Each of statements (2), (3), and (4) seems to be a weakening of the one before it. 
(When we say statement (2) is weaker than statement (1), we mean there might 
be more points and sets satisfying the former than the latter.) Statements (5) and 
(6) are altered in a different way. Though the second statement seems weaker 
than the first, in a sense this is not really so: 


THEOREM 7.1: (a) Statement (1) implies statement (2); 
(b) If's € S, then statement (1) and statement (2) are equivalent. 


PROOF: (a) Is clear, since ¥ is finite; 

(b) If statement (2) holds, then ti? ~s|:2©(s—«£,s+e)\S} is a finite set of 
positive numbers, and so it has a positive infimum (Exercise 5.1.8). If ¢; is this 
infimum, then (s~ €1,8+£:) © 5 and statement (1) holds. = 


If s is one of the "missing points" in statement (2), S is called a deleted 


neighborhood of s. These are of considerable importance in calculus but of little 
interest to us now (statements like "f(x) does so and so if 0 < |x — a| < 6" are 
references to deleted neighborhoods). It is left to the reader to decide that the 
situations described in statements (3) and (4) are so simple as to be uninteresting. 


THEOREM 7.2: Statements (5) and (6) above are equivalent. 


PROOF: Clearly statement (5) implies statement (6) since an infinite set has at 
least two elements (one of which might be s). Let ¢ > 0 be given, and suppose 
5 (s~—<,8 +) contains a point other than s. Call this point s,. Now let ¢, = |s — 
s,| > 0. There is a point of S other than s in (s — €, s + €}); call it s>. Note that s, 
# s,. Continuing in this way, we find s), s>, 53, ..., infinitely many points of 


SN(s—é4,a+e2) o 


EXERCISES 7.2 
1.Draw a picture to illustrate the proof of Theorem 7.1.b. 


2.Explain the comment about statements (3) and (4). 
7.3 ANOTHER POINT OF VIEW 


Consider the statement "S is not a neighborhood of s." Then it is not the case that 


de > O035((s—e,st+e) CS J. That is, Ve > O((s —e,s +e) ZS) or 
Ve > 0((s —€,8 +2) C(S) # 0), which is very much like statement (6) applied to 
the set C(S) (how is it different?). A point seems to have the property we're 
talking about (whatever that might be) as it relates to C(S) if S is not a 
neighborhood of it. 


7.4 CLUSTER POINTS 


We will not have much occasion to use the latter characterization. We have 
mentioned it only to show that the idea can arise in more than one way. 
Statements (5) and (6) are equivalent, and so we may take either of them as our 
definition. The fifth is a bit more suggestive: 


DEFINITION 7.3: The point s is a cluster point of the set S if, for every ¢ > 0, 
the set (8 — &.8+£)S is infinite. ! 


This is the same as requiring the intersection of every neighborhood of s with S 
to be infinite. Theorem 7.2 says that a point s is a cluster point of a set S if and 
only if, for every © > 0, 5 (s~e,s+e)\{s} #0. Knowing that a set is infinite 
would seem to be more useful than knowing just that it is not empty. Curiously, 
it is often easier to show that a set is infinite than to show that it is not empty. It 
is rare that the more useful bit of information is easier to come by. 


EXAMPLES 7.4: 1. A finite set has no cluster points since it is impossible for 
its intersection with any set to be infinite. This is not the only way a set can fail 
to have a cluster point. The set of natural numbers is infinite, yet it has no cluster 
point since (« — ©. + ©) N is finite for any x and any ¢ > 0. 


2. 1/2 is a cluster point of (0, 1). If ¢ > 1/2, then (1/2—£, 1/2 +2)N(0,1) = (0,1) The 
< 1/2, then (1/2 —¢.1/2+¢)(0,1) = (1/2—,1/2+e), In either case, the intersection 
is an open interval, which is an uncountable set by Exercise 2.1.2. Notice that 0 
is also a cluster point of (0, 1) since (9:1) (0 —¢,0 + ¢) = (0,min{e,1}), which is 
an infinite set. 


We can conclude from this example that if S is a neighborhood of s, then s is a 
cluster point of S. But this does not work the other way: 0 is a cluster point of (0, 
1) even though (0, 1) is not a neighborhood of 0. It is not necessary for a set to 
be a neighborhood of a point for the point to be a cluster point of the set, in fact, 
it is not necessary for a point to be an element of a set for the point to be a 
cluster point of the set. Furthermore, in view of Example 1, a point that is an 
element of a set need not be a cluster point of the set. 


3. 0 is a cluster point of # = (1: arb Ie>Ois given, and 7 is such that 1/n < 
€ (by Corollary 6.2.b), then bn aie} S(O~2,0+2)OH | go this intersection is 
infinite. 


EXERCISES 7.4 


1. Show that s is a cluster point of S if and only if S$ U is infinite whenever U 
is a neighborhood of s. 


2. If S is a set that is bounded above, show that sup S is either an element of S$ 
or is a cluster point of S. 


3. We may say that x is a "right" cluster point of a set S if, for any 


e > 0, SO (£,2+¢) #0, and similarly for "left" cluster points. We could also 
insist that these intersections be infinite. 


(a) Show that the two definitions of right cluster point suggested above are 
equivalent, and the two definition of left cluster point suggested above are 
equivalent. 


(b) Show that every right cluster point is a cluster point, but that not every 
cluster point is a right cluster point. Similarly for left cluster points. 


(c) Examine the examples of cluster points in the chapter. Which are right 
cluster points and which are left cluster points? 


(d) If x is a cluster point of S, must it be the case that x is either a right cluster 
point or a left cluster point of S? 


7.5 DERIVED SETS 


We'll take a detour at this point to identify the set of all cluster points of (0, 1). 
Every element of (0, 1) is a cluster point of it [since (0, 1) is a neighborhood of 
each of its points], and 0 and | are also cluster points. Are there any others? If x 
<0, the interval (x — 1, 0) is a neighborhood of x containing no element of (0, 1), 
and so x is not a cluster point of (0, 1). Similarly, if x > 1, the interval (1, x + 1) 
is a neighborhood of x containing no element of (0, 1). Thus the set of cluster 
points of (0, 1) is [0, 1]. The set of cluster points of a set S is called the derived 
set of S and is denoted S’ (read '"S prime"). We see that a derived set might be 
larger than the original set. We will frame the rest of our examples in terms of 
derived sets. 


EXAMPLES 7.5: 1. It is also the case that [0, 1]’ = [0, 1]. Most closed intervals 
are equal to their derived sets. The only exceptions are those that consist of a 
single point. For instance, [3, 3] is a closed interval but has no cluster point 
(since it is finite). In any case, though, a closed interval contains all of its cluster 
points, and an open interval does not. 


2. Consider # = {1.5.3:---} again. We have shown that 0 € H’. Ifx <0, then (x — 
1, 0) is a neighborhood of x whose intersection with H is empty, so x is not a 
cluster point of H. If x > 0, then {2>°) is a neighborhood of x whose intersection 
with H is finite (be sure you see why). Thus 0 is the only cluster point of H; that 
is, H' = {0}. The derived set of an infinite set can be finite (or empty), and we 
see that a set can be disjoint from its derived set. Occasionally we will want to 


build sets with specified cluster points. This example gives us a hint how to do 
it. 


3. Q’ = R. By the Density theorem, if x is any real number and ¢ > 0, then 
(w,2+e)1Q #0 and so (x — e, x + €) contains an element of Q other than x. We 
see that the derived set can be much larger than the set itself (here the derived set 
of a countable set is uncountable). The Density Theorem also tells us that (R\Q)’ 
=R. Note that two sets can have the same derived set without being the same. 
There are no "antiderivatives" here. 


Our goal is not to study derived sets for their own sake, but we will prove one 
theorem before we move on. We write S” for (S")'. 


THEOREM 7.4: For any set §; S" oS". 


PROOF: We will examine this proof by the forward-backward method. We will 
use the alternative form of the definition given in Statement (6). This is a set 
containment problem, and so we know that it has to begin and end like this: 


The definition of the conclusion ("© 5") includes a universally quantified 
variable, «. We must introduce it at the beginning, as we insert the rest of the 
definition at the end: 


——» Let e > 0 be given. 
—+(r—e,zrt+e)NS\{z} #0 
Then zx € S". 


Let ze€ 8”, 


Let e > 0 be given. 
— Since « € S$”, x is a cluster point of S". 


* * * 


(rx—-e,xr+e)NS\{zr} £0. 


Let e > 0 be given. 

Since « € S”, x is a cluster point of S’. 
e+ (r—e,rt+e)NS\{z} FO. 
(x—e,r+e)NS\{x} £ O. 

Then z € S$”, 


We seem to be almost done, but the set we know to be nonempty is not the one 
we wish to be nonempty. Since we have found that something exists [an element 


of (e@ -©,2+2)S'\{r}) this is a good time to draw a sketch. Let y be the element 
of (& — 5% +2) S'\{@} that we are guaranteed. Note in particular that y < S’. 


r—eE i y Z+eE 


(In this picture, we have put y to the right of x. We should be careful that nothing 
in our proof makes use of this since it might not always be the case.) The é- 
interval around x is a neighborhood of y, and so contains an ¢-neighborhood of y. 
Since y is a cluster point of S, we know something about e-neighborhoods of y. 
We select a new value of ¢ by examining the drawing, and, with a brief 
observation, find that we're done: 


Let e > 0 be given 
Since « € S”, x is a cluster point of S’. 
+ Let y€ (rx -—e,r+e)NS'\{z}. 
eu Lets, =min{y — (x — «), (x +e) —y}.* 


<> Since y € S’, (y—e1,y +61) NS\{y} £ O. 
— Let z€ (y—é1,y +61) NS\{y}. 
— > Then z € (r-—e€,r +e)NS\{z}. 


(x-—e,2+2e)NS\{r} £0. 


Most of the work in this proof was suggested by the picture. The moment in the 
proof where we find what we are looking for (which happens in the third from 
last line) pops in quite easily. We do have to remember where we want the proof 
to go to see that the statement is useful, but it is not difficult to make this 
connection. 

The line marked by ** is where we might have erroneously used the position 
of y in the picture. The ¢,-interval around y should be contained in the ¢-interval 
around x. To accomplish this, ¢, should be the distance from y to the end of the e- 
interval. In the picture, this is x + ¢ — y since y is nearer the right end of the 
interval. If y were to the left of x, this would not be true, and x + ¢ — y would be 
too big. By choosing ¢, as we have, we allow for both possibilities. 


Since it is not always true that X’ © X, not every set is a derived set. 
Theorem 7.4 suggests that derived sets might be more well-behaved than are 
typical sets. We will see just which sets can be derived sets in Exercise 7.5.9. 


EXERCISES 7.5 
1. There is an error in the proof of Theorem 7.4. Find it and fix it. 
2. Show that if S © T, then S’ CT”, 


3. Find the derived sets of the following sets: 


(a) {2+ 0” sn EN} 
(b) {sin(n) “ne N} 
(c) {i A. >" :nvme N} 


4. (a) Construct a set for which S" # S’. 
(b) Construct a set for which S’'” 4S". 


5. Show that adding finitely many points to a set or deleting finitely many 


points from a set does not change its derived set. 


6. (a) Show that (A U B)’ = A’ U B’ (Hint: Use the forward-backward method!) 
(b) Show that the relationship in (a) holds for a union of finitely many sets. 


(c) Show that Usea(4e)’ S (Uses 4e) in any event, but that this containment 
might be strict if A is infinite. 


(d) Show that (Nee 4a) S Noca(Ae)’ and that this inclusion might be strict, 
even if A is finite. 


7. Why does the following not establish that Q’ = R? By the Density theorem, if 
x is any real number and ¢ > 0, then (@&~£.2+2)Q #0, 


8. Show that S is dense in R if and only if S’'=R. 


9. (a) Construct a set whose derived set is {1, 2, 3}. 
(b) Is there a set whose derived set is Q? (Consider Theorem 7.4.) 


(c) If € Sand x is not a cluster point of S, then x is called an isolated point 
of S. Show that x is an isolated point of S if and only if there is an ¢ > 0 so 
that (t@-6,.2+6e)NS {x } 

(d) Suppose that {x,} 1s the collection of isolated points of a set S. Show that 
there exist positive numbers {¢,} so that the ¢,-intervals around x, are 
mutually disjoint. 


(e) Show that a set can have at most countably many isolated points. 


(f) Suppose S is a set that contains all of its cluster points. Show that S is the 
derived set of some (possibly different) set. 


7.6 THE BOLZANO-WEIERSTRASS THEOREM 


Are there circumstances under which we can guarantee that a set has a cluster 
point? We have found that a set with a cluster point must be infinite, but that 
there are infinite sets with no cluster points (N, for example). The natural 
numbers and the set H in Example 7.5.2 are very similar in some ways. If we 
look at very small pieces of the number line, H and N seem much the same. 
Looking very closely at any element of either set, we see no other point of the 
set. But H has a cluster point, and N does not. The difference must lie in the 
larger structure of the sets; it must be some property of the whole set. For 


instance, we might note that H is bounded, while N is not. It happens that this is 
just what we need. 


THEOREM 7.5: (The Bolzano-Weierstrass Theorem) Jf F is an Archimedean 
ordered field in which the Nested Intervals property holds, then any bounded, 


infinite subset of F has a cluster point.’ 


The proof that follows is as much a piece of history as a piece of mathematics. 
This can be troublesome if the years have polished an argument to the point 
where the motivation for it can't be seen anymore. We are trying to show that a 
cluster point exists, and certainly the best way to do that is to find one. We have 
essentially only one tool available for this task—the Nested Intervals property. 
What we must do, then, is construct a nest of closed, bounded intervals whose 
intersection consists of a point that is a cluster point of the set in question. This 
point will be in every interval in the nest (since it is in the intersection) and if an 
interval contains a cluster point of a set, it very likely contains infinitely many 
points of that set (draw a sketch to convince yourself of this). We should be 
looking for a nest of intervals, each of which contains infinitely many points of 
the set. Keeping this in mind makes the proof much easier to follow. 


PROOF: Let S be a bounded, infinite set, and let by) be an upper bound for S' and 


ay a lower bound, so that 5 © |@0, bo) = Jo. Let Jr, Let J, be the left half of Jp, and 
Tp the right half (to be precise, J, = [ap, (dg + bp)/2] and Ip = [(dg + bp)/2, bo]). 
Now one (or both) of / = |(@0 + %0)/2,50]) or SOJ, or SOI is infinite, since 


otherwise 9 = (9 9/1) U(5 1k) would be finite. Let J, = J, if $9 Ju is infinite 
and J, = Ip if SO Jz is finite, and let 91 = SO. Now S, is an infinite set 
contained in the interval /;. The process by which we obtained /, can be repeated 
to find another interval, /,, which is either the left or right half of 7,;, and is such 


that 52 = So is infinite. Continuing in this way, we find a nest of intervals: 
f; 2 lg 2.--, each the left or right half of the previous one and such that 
Sn» = 50 In is infinite for all n. 


By the Nested Intervals property, [n/» *#%, and since the infimum of the 
lengths of the intervals /,, is 0, there is a real number x so that [n/n = {*}. We 


will show that x is a cluster point of S. Let ¢ > 0 be given. There is an interval /no 
whose length is less than ¢ (all but finitely many of them pe a see 
Since the length of /n. is — pa e and © € Ino, we have Jno & (* ) (by 
Theorem 4.20). Now Sno & Jno © (—€,2+€) and Sno is infinite “(this i fie way 


the intervals 7, were chosen). It follows that °° (*—<¢,2+<) is infinite and that x 
is a cluster point of S. = 


EXERCISES 7.6 


1. 


Where is the Archimedean property used in the proof of the Bolzano- 
Weierstrass theorem? 


. Consider the comments preceding the proof of the Bolzano-Weierstrass 


theorem. Under what circumstances could an interval contain a cluster point 
of a set without containing infinitely many points of the set? 


. If a set S has no cluster points, show that S must be either finite or 


unbounded. 


. (a) Give an example of an infinite set with no cluster point. 


(b) Give an example of an infinite set having the property that its intersection 
with any set of the form [—n, n] 1s finite. 


(c) Suppose S is a set with the property that 9 |—”." is finite for each n € N. 
Show that S is countable. 


(d) Show that every uncountable subset of the real line has a cluster point. 


. (a) Show that the Bolzano-Weierstrass theorem fails in the field of formal 


rational functions. 


(b) Show that the Bolzano-Weierstrass theorem fails in any ordered field that 
is not Archimedean. 


. (a) Let S be a bounded, infinite set. Show that S’ is also bounded. 


(b) If S is as in (a), we define the limit superior of S by lim sup S' = sup S’. 
Show that a = lim sup S if and only if (¢~<,9°)" S is infinite for all ¢ > 0 and 
(a+ €,00) 5 js finite for all e > 0. 


(c) If S is bounded above, show that lim sup S is a cluster point of S (first 
convince yourself that this is not just part of the definition). 


(d) If S is bounded below, the limit inferior of S is given by lim inf S = inf S$ 
’. State and prove results similar to (b) and (c). 


(e) Show that it is possible for S’ to be bounded even if S is not. 


(f) If S is such that S’ is not bounded above, we say lim sup S' = 0. Show that 
if lim sup S = ©, then 5) (@, °° is infinite for all a, but that this condition is 
not sufficient to guarantee that lim sup S = ©. 


(g) If S’ =, what are lim sup S and lim inf S? 


(h) Is there any relationship between these uses of the words "lim inf" and 
"lim sup" and the usage in Exercise 1.15.9? (For each « € S, consider the set 
S. = ty : y < x}. Is there a relationship between lim sup S and 
lim sup{S;, : 2 € S}?) 


7. (a) If Sis any bounded infinite set, show that lim inf S < lim sup S. 


(b) If in addition S has more than one cluster point, show that lim inf $< lim 
sup S. 


8. The following "proof" contains at least two serious errors. Find them and 
explain why they are errors. Give an example to show that the "theorem" is 
false. 


"THEOREM" Every nonempty set that is bounded above contains a point 
that is a cluster point of the set. 


"PROOF" Let A be a nonempty set that is bounded above. Since it is 
bounded above, it has a supremum, say u. By Theorem 5.3, for every ¢ > 0, 
there is an element of A, say a,, with a, > u — ¢. Then “ © (U—5©,"+®), and 


so u is a cluster point of A. Since u is the supremum of 4; “ © 4, and so A 
contains one of its cluster points. 


9. (a) Show that a bounded set having exactly one cluster point is denumerable. 
(Hint: If S is such a set and c is the cluster point, consider the sets 
5.9 ((~e0,e~ 5) U(e+ .°°)) How many elements can these sets have?) 

(b) Show that the assumption in part (a) that S is bounded is unnecessary. 

(c) Show that a set having finitely many cluster points is denumerable. 


(d) Is a set having denumerably many cluster points necessarily 
denumerable? 


(e) Is a set having uncountably many cluster points necessarily uncountable? 


7.7 CLOSING THE LOOP 


Now we wish to show that the Bolzano-Weierstrass theorem and _ the 
Archimedean property together imply the Least Upper Bound property.* This 
proof is a construction and as such may not be as transparent as some other 
proofs. Before we begin, we will get an idea what it is we want to accomplish, 
but with constructions, one must often adopt the attitude "Follow along, and it 
will work out in the end." 


THEOREM 7.6: Jf F is an Archimedean ordered field in which the Bolzano- 
Weierstrass theorem holds, then the Least Upper Bound property also holds in F. 


You will pick out where the Archimedean Property is used in this proof in 
Exercise 7.7.1. We must begin with a nonempty set that is bounded above, and 
we wish to construct an auxiliary set of some sort having a cluster point that is 
the supremum of the original set. Since a set can have only one supremum, we 


should construct a set with only one cluster point. Recall that the set 


$4.1 9 . . 
H = {1,313} has only one cluster point, and so we might try to make our 


auxiliary set should look something like H. This may not be much to go on, but 
keeping it in mind will make the steps of the proof a bit more reasonable. 


PROOF: Let S be a nonempty set that is bounded above, and let ug be an upper 
bound for S. There is a natural number 7 so that wg — n is an upper bound for S 
but (uy — n) — 1 is not. Let u; = ug — n. (Note that uv, is an upper bound for S but 
u, — 1 is not—we strongly suspect that uw, — 1 < sup S <u.) If wu, — (1/n) is not 
an upper bound for S' for any natural number n, then uw, = sup S (Exercise 6.1.10). 
If this is the case, we are done. Otherwise, {n : u, — (1/n) is an upper bound for 
S\ #0, and, by well-ordering, has a least element. Call this number 7. Note that 
n, # 1 and that uw, — (1/n,) is an upper bound for S but uv, — (1/(n, — 1)) is not (we 
have further narrowed down where sup S might be). Let uy = u, — (1/n,) and 
begin again: If uw, — (1/n) is not an upper bound for S for any n, then uw, = sup S, 
and we are done. Otherwise, let 7, be the least natural number so that uw, — (1/n>) 
is an upper bound for S and let uw; = u, — (1/nz). These steps are now repeated. 


Here is a picture of one step (remember, though, that S may not be as simple as it 
appears in this diagram). 
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This process might end after finitely many steps [after saying "uw, — (1/n) is not 
an upper bound for S for any value of n, and so u, = sup S and we are done"]. On 


the other hand, it might never end. We may assume that it doesn't. We have made 
sets U = {ur : k © N} and N = {me : k © N} co that 


(1) uw; is an upper bound for S for every k; 
(2) Ups = Ug — (1/ny); 
(3) u,— (1/(n;_,)) is not an upper bound for S for any k. 


Now n; > n;_, for all k, and so sup N= 00 and inf{(1/ne) sk © N} = 0. Since uz, < 
u, for all k, u, is an upper bound for U. Each element of S is a lower bound for 
U. Thus U is a bounded, infinite set, and so it has a cluster point by the Bolzano- 
Weierstrass theorem. Call the cluster point u. Where can u be? Suppose x is such 
that x < s for some s € S and let ¢ = s — x. Then there can be no elements of U in 
(x — e, x + €) since all elements of U are greater than s. Therefore, if x is not an 
upper bound for S, x is not a cluster point of U. It follows that u is an upper 
bound for S$. By Lemma 10.2 (!?!—the terminology of Chapter 10 will make this 
lemma easier to prove), u < u, for all k. Let d < u and let k € N be such that 
1/(n, — 1) <u — d. Remember that uw, — (1/(n; — 1)) is not an upper bound for S. 
Since d < u— (1/(n, — 1)) <u, — (1/(n;, — 1)), we see that d is not an upper bound 
for S. Since uw is an upper bound for S and d is not an upper bound for S$ 
whenever d < u, we have u = sup S. = 


So far, we have completed this much of the Big Theorem: 


Least 


Upper Bound 
Property 
ff _ —_ 
A 
Bolzano- Nested 
Weierstrass = Intervals 
Theorem | Property 


Let us reflect for a moment on the Big Question—the difference between the 
rational and real numbers. We showed at the end of Chapter 6 that the rational 
numbers do not have the Nested Intervals property (by finding a carefully chosen 
nest). Observe that the set of left endpoints of the intervals in that nest (thought 
of as a subset of Q) constitute both a nonempty set that is bounded above but has 
no supremum and a bounded, infinite set with no cluster point (you will verify 
these claims in Exercise 7.7.3). By reinterpreting the same example, we see that 
neither the Least Upper Bound property nor the Bolzano-Weierstrass theorem 
holds in the rational numbers. 


EXERCISES 7.7 


1. (a) Verify the statement in the proof of Theorem 7.6: "”, > n,_, for all k, and 
so sup N = ©, and inf{1/n,:kKEN} =O" 


(b) Find where the Archimedean property is used in the proof of Theorem 
7.6. 


(c) Show that the process described in this proof ends after finitely many 
steps if and only if the difference between up and uw is rational. 


2. Suppose U and S are sets with the properties: (1) each element of U is an 
upper bound for S; (ii) for any ¢ > 0, there are elements u of U and s of S 
with |u — s| <e. 


(a) Show that inf U= sup S. 


(b) Find where this can be inserted as a lemma in the proof of Theorem 7.6. 


3. Verify the claims made in the final paragraph of the chapter. 


| Some authors use the phrase "accumulation point" or "limit point," while others use the same words 
with slight differences in meaning. One must be careful to check how a particular author uses these terms. 


2 It is best to remember the Bolzano-Weierstrass theorem as: Every bounded, infinite set has a cluster 
point, but of course we must state our result as we have to keep it in the context of the Big Theorem. 


3 4 note on the structure of the Big Theorem: The Archimedean property [which is stated in part (b) of 
the Big Theorem] implies itself [stated in part (c)]. 


Chapter 8 


Topology of the Real Numbers 


8.1 OPEN SETS 


The importance of intervals in our investigations suggests it might be 
worthwhile to generalize their properties. We will list some things we know 
about open and closed intervals, and then create definitions of "open" and 
"closed" that can apply to sets other than intervals. This is a roundabout way of 
doing things, but it will be very fruitful. Just about everything we have done has 
had something to teach us about open intervals. For instance: 


(1) An open interval is a set of the form {x : a < x < b} for some numbers a 
and b (a may be replaced by —o or b by ~). 

(2) An open interval doesn't contain its supremum or infimum, even if it has 
one. 


(3) An open interval is a neighborhood of each of its points. 


(4) An open interval (one that isn't the whole line) has a cluster point that it 
doesn't contain. 


(5) An open interval is uncountable. 


We want to make of one of these statements our definition of "open set." Which 
would work best? The first is much too specific. If we took it as our definition, 
we wouldn't get any open sets other than the intervals themselves. The second is 
better, but there are sets that would be open if we adopted this as our definition 
that we might not want. The set (01) U [2,3) doesn't contain its supremum or 
infimum, but it doesn't look much like an open interval when we look near the 
points 1 and 2, and we probably shouldn't call it open. The fifth property is too 
general. The interval [1, 2] is uncountable, but we probably don't want it to be 
open. This leaves us with the third statement, which we take as our definition: 


DEFINITION 8.1: A set is open if it is a neighborhood of each of its points. 


If we combine this definition with the definition of neighborhood, we find that a 
set A is open if, for every * © A, there is an ¢, > 0 with ( — &*,%¥ + &2) © A, Take 


careful note of the quantifiers. 
EXAMPLES 8.1: 1. Open intervals are open sets. The whole real line is open. 


2. The empty set is open. Since # contains no points at all, it does not contain any 
point of which it is not a neighborhood. 


3. Let 4 = (9,1) U (3,4), This is a good time to recall a couple of our basic 
techniques of proof. The definition of open is universally quantified, and so to 
prove that the set A is open, we should begin by giving a name to an element of 
A: Let © © 4. Now we need to show that the definition of open is satisfied for 
this particular x, that is, we must show that A is a neighborhood of x. Now A is a 


given as the union of two sets, which means that the hypothesis of this statement 


(© € A) can be rewritten: © € (0,1) or « € (3,4). The "or" in the hypothesis 


indicates a proof by cases: 


Case 1: © © (0, 1), We know that (0, 1) is a neighborhood of x since (0,1) is an 
open interval. Exercise 4.10.1 says that if U is a neighborhood of x and U C V, 
then V is a neighborhood of x. But (9,1) © A, and so 4 is a neighborhood of x. 
Notice that Case 2—* € (3, 4) is resolved in the same way since (3, 4) is an 
open interval and (3,4) © A, 


4. B = [0, 1) is not open. Every e-neighborhood of 9 € B contains a point 
outside B (to be specific, the point —¢/2). 


We didn't use all we know about open intervals in Example 3 (for instance, we 
didn't refer to endpoints). Because of this, we can deduce much more from the 
proof than we originally stated. The only property of open intervals used was 
that they are neighborhoods of each of their points. But this is just the definition 
of an open set! By precisely the same proof, we obtain: 


THEOREM 8.2: The union of two open sets is open. = 


The crucial step in Example 3 was the application of Exercise 4.10.1, which says 
that if A © B and A is a neighborhood of x, then B is a neighborhood of x. If B 
is formed by the union of A with any collection of sets, then A © B. We may 
use the same proof again to establish that: 


THEOREM 8.3: The union of any collection of open sets is open. = 


It is not true that the intersection of any collection of open sets is open: Let U,, = 


(-I/n, 1/n), n = 1, 2.... Then U,, is open for each n, but A, Un = {0} which is 
not open. On the other hand, we have shown (in Exercise 4.10.2) that the 


intersection of a finite collection of neighborhoods of a point x is a neighborhood 
of x. It follows that 


THEOREM 8.4: The intersection of any finite collection of open sets is open. = 


EXERCISES 8.1 


1. To use Exercise 4.10.1 in the last step of the proof of Theorem 8.2, we need 
to know that A C AU B., But this holds no matter what B is. Does the proof 
show, then, that any set that contains an open set is open? 


2. Prove Theorem 8.4. 


3. Do an induction to show that the union of a finite collection of open sets is 
open. 


L & 1 


4. Show in detail that Mn—1 (—2:1+ =) = [0,1], 


5. The definition of open set is quantified: "Vxde ...." Suppose this is reversed 
to "deVx ...." Show that the only nonempty set satisfying this new condition 
is the whole real line. 


8.2 GENERAL TOPOLOGIES 


We have found that open sets have certain properties, which we bring together to 
form an important definition: 


DEFINITION 8.5: Let_X be a set and T a collection of subsets of X such that: 
(i) X € T and ET. 
(11) The union of any collection of sets from T is in T. 

and (iii) The intersection of any finite collection of sets from T is in T. 


Then T is called a topology on_X (this explains the title of the chapter). The pair 
(X, T) is called a topological space, and the elements of 7 (which are themselves 
sets, remember) are called open subsets of X. 


Our goal here is not to study topology for its own sake (there are exercises 
throughout the chapter about it), but if you look at an introductory topology text 
after finishing this one, you will find many familiar words and ideas. There is 
one idea from topology, more an attitude than a theorem, that will be of some 
importance to us. A proof or definition that is topological, that is, one that uses 
only facts about open sets, is usually preferable to one that uses other structures 
(such as order, algebra, or distance). There is no specific reason for this, though 
any proof we can do "topologically" will not have to be redone when we go on 
to more abstract topics (in the same way that our early proof that there is only 
one additive identity in a field need never be done again). There is also a certain 
"elegance" in doing more with less. This lofty notion will become more palatable 
when we discover that many proofs are easier when done with more abstract 
tools! 


EXERCISES 8.2 


1. (a) Let S= {1, 2, 3} and let T = {0, {1, 2, 3}, {1, 2}, {3}}. Show that 7 is a 
topology on S. 


(b) Let S = {1, 2, 3}, and let 7 = {0, {1, 2, 3}, {1, 2}, {1, 3}}. Show that 7 is 
not a topology on S. 


(c) Let S= {1, 2}. Find all sets of subsets of S (there are 16 of them). Which 
are topologies? 


(d) Repeat (c) with S= {1, 2, 3}. 
(e) Show that 7 = {S CR: R\S is finite} U {9} is a topology on R. This is 
called the finite complement topology. 


(f) Notice in (e) that the complement of “is not finite. Why is it included in T 
9 


(g) If X is any set, show that 7 = 1S © X : X\° is finite} U {0} is a topology on_X. 


2. (a) Show that a set is dense in the real line if its intersection with any open 
set is nonempty (the definition of "dense" requires only that the intersection 
of the set with any open interval is nonempty). 


(b) Show that the natural numbers are not dense in the real line. 


(c) Let T consist of !, R, and all sets of the form (a, ©) for some a € R. Show 
that T is a topology on R. 


(d) Show that the natural numbers are dense in the real line (using the 


definition given in (a) of this problem) if it is given this topology. 


(e) Let T consist of , R, and all sets of the form [a, ©) for some a € R. Show 
that T is not a topology on R. 


3. (a) If X is a set, show that P(X) and {l), X} are topologies on X. The former is 
called the discrete topology. The latter, in contrast, is called the indiscrete 
topology. Notice that the discrete topology, which consists of every subset of 
X, is the "biggest" topology on_X (in the sense that it contains the most sets), 
while the indiscrete topology is the smallest collection of subsets that can 
possibly be a topology, since it contains only those two sets that any 
topology must contain. 


(b) Give an example of a topological space having a finite, dense subset (you 
must give a set and a topology). 


(c) Show that any nonempty set is dense in any space having the indiscrete 
topology. 

(d) Describe the dense subsets in a space having the discrete topology. 

(e) If X is finite, show that the finite complement topology on_X is the same 


as the discrete topology. (Think carefully about what "the same" means in 
this setting. On the simplest level, a topology is a set.) 


(f) If_X is infinite, show that the finite complement topology on X is not the 
same as the discrete topology. 


8.3 CLOSED SETS 


Here are some of the things we have learned about closed intervals: 


(1) A closed interval is a set of the form {x: a <x <b}, {x:a<x<o}, or {x 
:— 00 <x <b} for some a and 5, or is the whole line. 

(2) A closed interval contains its supremum and/or infimum, if it has one. 

(3) A closed interval contains all its cluster points. 

(4) The Nested Intervals property holds for closed, bounded intervals. 


If we are looking for a general definition of "closed set," we may reject the first 
and second of these as before: (1) is too specific, and (2) would admit "closed" 
sets we probably don't want. The intervals in the Nested Interval property had to 
be closed and bounded, but there are unbounded closed intervals. Condition (4) 


is too restrictive (although closed, bounded sets will be very important to us 
later). This leaves us with: 


DEFINITION 8.6: A set is closed if it contains all its cluster points (that is, if 
S'C S), 


EXAMPLES 8.3: 1. Closed intervals and the whole real line are closed sets. 
2. The empty set is closed since it has no cluster points. 
3. A= [0,1] U[8,4] is closed. 


4. B=[0, 1) is not closed since it has x = 1 as a cluster point, but | ¢ 2. This set 
is also not open. "Not closed" is not the same as "open." 


ss 1 1 Cr. ; 
5, 1 = (1,9; 3-+-} is not closed since 9 ¢ H is a cluster point of H. 


6.5 = HU {0} is closed. 


These examples suggest some important results. The observation in the second 
example may be made into a theorem, telling us, among other things, that finite 
sets are closed: 


THEOREM 8.7: 4 set having no cluster points is closed. = 


Theorem 8.3 does not hold for closed sets: U,, = [1/n, 2] is closed for each n € N 


, but U, Un = (0, 2), which is not closed. But all is not lost. By Exercise 7.5.6.a, 
(AU BY’ = A'UB' jf A! C Aand B’ C B, then(AUB)' = AUB’ C AUB 


and so A U Bis closed. By induction, we have: 


THEOREM 8.8: The union of finitely many closed sets is closed. = 


Exercise 7.5.6.a does not hold for infinite unions, but Exercise 7.5.6. says that 
(Ma Ya)’ S Ma(Ya)", which leads us to the following: 


THEOREM 8.9: The intersection of any collection of closed sets is closed. = 


So far we have seen that: 


(i) The empty set and the whole real line are closed. 


(11) The intersection of any collection of closed sets is closed. 
and (iii) The union of a finite collection of closed sets is closed. 


This would seem to be a nice mirror image of the similar results for open sets. 
We could define a topology beginning with closed sets, but it is not often done 
that way. This symmetry certainly suggests some connection between open sets 
and closed sets, though. The connection is actually very strong, as seen in the 
following theorem. Since this is an "if-and-only-if" theorem, we could have 
defined either "open" or "closed" and used this theorem as the definition of the 
other. Theorem 8.10 is the traditional definition of "closed." 


THEOREM 8.10: A set is open if and only if its complement is closed. 


PROOF: We will use the forward-backward method on the "only if" part of this 
proof. 


Let A be open. 
~» A is aneighborhood of each of its points. 


~—» C(A) contains all its cluster points. 


The definition of "closed" is a universally quantified statement about cluster 
points. We need to give a name to a specific cluster point of C(A) and prove 
something about that particular point: 


Let A be open. 
A is a neighborhood of each of its points. 
——» Let x be a cluster point of C(A). 


+ rE C(A). 
C(A) contains all its cluster points. 
Then C(A) is closed. 


We insert the definitions of "cluster point" and "complement": 


A is aneighborhood of each of its points. 
Let x be a cluster point of C(A). 
—» Every neighborhood of x contains infinitely many points of C(A). 


* * * 


—_,rEA 
xréEC(A). 
C(A) contains all its cluster points. 


We make an observation combining two of our statements, remember the 
definition of open sets, and find that we are done: 


A is a neighborhood of each of its points. 
Let x be a cluster point of C(A). 
Every neighborhood of x contains infinitely many points of C(A). 


— If A were a neighborhood of x, it too would contain infinitely many points of 
C(A). 

—. This is not the case, and so A is not a neighborhood of x. 

—» But then, since 4 is open ... 

r¢ A. 

xe C(A). 

C(A) contains all its cluster points. 

Then C(A) is closed. 


We now continue with the other half of the proof. Suppose C(A) contains all of 
its cluster points [that is, C(A) is closed], and let x € A. We must show that 4 is a 
neighborhood of x (to show that A is open). Since = € ©(A), x is not a cluster 
point of C(A). Therefore there is a neighborhood of x that has no point in 
common with C(A). This neighborhood is contained in A, and since A contains a 
neighborhood of x, A is a neighborhood of x. = 


In everyday language, things normally can't be both open and closed. 
Furthermore, if an object can be either open or closed, it usually must be one or 
the other ("ajar" notwithstanding). In mathematics, "closed" does NOT mean 
"not open." The empty set and the real line are each both open and closed (we 
will see later that they are the only sets with this property), while the set [0, 1) is 
neither open nor closed. We should not read into the words "open" and "closed" 
meanings other than those we have given them. 


EXERCISES 8.3 


1. (a) Show that a nonempty closed, bounded set contains its supremum and 
infimum. 


(b) Is the converse of this true? 


2. The closure of a set S, denoted S, is the intersection of all closed sets 
containing S. The interior of S, denoted S°, is the union of all open sets 
contained in S. The boundary of S is S \S*, and is denoted OS. 


(a) Show that S” is the smallest closed set containing S, and S* is the largest 
open set contained in S. (The meanings of "smallest" and "largest" were 
discussed in Exercise 1.15.7.) 


(b) Show that « € 0S if and only if every neighborhood of x contains points of 
S and of C(S) (elements of OS are called boundary points). 


(c) Give an example of a countable set with an uncountable boundary. 
(d) Let J= (0, 1). Find I, /*, and O/. Prove your results. 
(ce) Repeat (d) for: = [0, 1), J=[0, 1], and # = {1s 3+3+--+}, 


(f) Show that Sis closed if and only if S = S’, and S is open if and only if S = 
S*. 
(g) Show that OS is closed. 


(h) Explain why the following "proof" of the first half of part (a) 1s invalid 
[you should pay especially close attention if this was your answer to part 
(a)!]: 
Let C be the smallest closed set containing S. The intersection of all 
closed sets containing S can't be any larger than C because C is one of the 
sets being intersected. Thus S~ © C. On the other hand, S" is a closed set 
containing C, and C is the smallest such set, and so C © 5S’, and thus C = 
Ss. 
(1) The flaw in part (h) is very similar at its root to the flaw in the following 
"proof" that 1 is the largest natural number(!) Explain. 


Let n be a natural number greater than 1. Then n? > n, and so n is not the 
largest natural number. Consequently 1 is the largest natural number. 


(Notice that the argument here is sound, and so the flaw must have 
something to do with what happened before the argument began.) 


3. (a) Find the interior, closure, and boundary of the rational numbers if the real 
numbers are given the standard topology. 


(b) Find the interior, closure, and boundary of the rational numbers if the real 
numbers are given the topology in the Exercise 8.2.2.c. 


4. (a) Let S be a bounded set. Show that inf S and sup S are boundary points of 
S. 


(b) If Sis nonempty and S has no boundary points, show that S is unbounded. 


(c) Show that an interior point of a set can't be a boundary point. 


5. (a) Construct a set for which the set, its closure, the complement of its 
closure, and the closure of the complement of its closure are all different. 


(b) Suppose we repeat this over and over: closure — complement — closure 
— .... Is there a limit to how many different sets can be obtained in this way? 
(Hint: If A is the closure of an open set, show that C(C(A)) = A. This is 
called Kuratowski's problem. The answer is 14. Try to construct a set for 
which this number 1s attained.) 


8.4 THE STRUCTURE OF OPEN SETS 


Looking at the previous theorem, we might guess that the study of closed sets 


and the study of open sets amount to much the same thing. In view of the 
examples, though, we see there might be some differences. A nonempty open 
set, since it contains an open interval, is uncountable. Closed sets can have all 
sorts of cardinalities. Are closed sets somehow more complicated than open 
sets? We will find that, in a way, they are. We show now that open sets are quite 
predictable in their structure, and we will find later (see Exercise 8.4.7) that 
closed sets can be very peculiar. This is a long proof, but it uses a variety of 
techniques and is a good detective story. 


THEOREM 8.11: 4 nonempty set S is open if and only if there is a countable 
collection of mutually disjoint open intervals {U,, U>, ...+ such that ° = On Un. 


PROOF: The "if" part is already established since any union of open sets is 
open. There is much to do in the other direction. We must construct a collection 
of intervals from the given set S. Our first step is to associate with each point in 
S the largest open interval containing it and contained in S. How can we go about 
this? Suppose S is a single open interval, say (0, 1), and consider a point in it, 
say 1/3. Can we distinguish points less than 1/3 that are in the interval from 
those that are not in the interval without making any reference to the endpoints? 
(We're looking for the endpoints!) If a < 1/3 and @ © \0, 1), all numbers between 
a and 1/3 are also in (0, 1). But if a < 0, there are points between a and 1/3 that 
are not in (0, 1). Whether a point is in a set or not is something we can check. 
Notice that 9 = inf{a < 1/3: (a, 1/3] C (0,1)}. 

For each x € S, let A, = {a:a<xand (4! © 5}, Since Sis an open set, A, 4 
0. Now A, is either bounded below or it is not. If A, 1s bounded below, let a,. = inf 
A, otherwise let a, = —o0 (this is acceptable only because a, is to be an endpoint 
of an interval). Likewise, we let B, = {b : b > x and \",») © 5}, and b, = sup B, 
or +00, as appropriate. The collection {(a,, b,)} is essentially what we're looking 


for. We show first that ° = Ures(4x,s), Note that this is a set-equality problem. 

Let «€S. Since S is open, there is an ¢ > 0 with (t-¢.e+e)CS. In 
particular, ‘* ~ ©.) © S, so that a, <x — e < x (since x — « is an element of the 
set of which a, is the infimum). Similarly, we find that x < x + e < b,. Thus 
x € (az,be), and % SUres(418) Now suppose ¥ © Uses(4zsb«). There is an 
to € S so that ¥ © (410, 6x0), We may assume %0 < ¥ < 70, Since ¥ > Azo = inf Ary, 
there is an © Ax, with y > a. This means ¥ € (4,20]} SS, so y€ S. Thus 
Ueg{as, be) C S 


Are we done? Not by a long shot. If S = (0, 1), we have (a,, b,) = (0, 1) for 
every «€S. The intervals (a,, b,) are not mutually disjoint, and there are 
uncountably many of them. There is hope, though, because the same interval 
appears infinitely many times. We will show that this must happen, in the sense 
that any two of these intervals are either disjoint or are the same. We first show 
a €S for any x. This is clear if a, = —oo, and so we may assume @: © R, If 


a, € S, there is an ¢ > 0 with (az — £,a: +) © S. Since a, = inf A,, there is a 
y € Az with a, <y <a, +. Then (4 ~ €,2] = (az — ¢,42 + €) U (y,2] © 5, and so 
a;—©€ Ay, contradicting the way a, was chosen. Similarly, °» € 5. Since 


S = Ures(@21=) none of the points a, or b, are contained in (a,, b,) for any z. It 
follows from Exercise 4.9.2 that any two of these intervals are either disjoint or 
identical. We will agree to list each interval of {(a,, b,.)} only once. 


We have now shown that S can be written as a union of mutually disjoint 
open intervals. It remains to show that there are only countably many of them. 
We will put {(a,, 5,)} into one-to-one correspondence with a set of rational 


numbers. Associate with each interval (a,, b,) that rational number, say q,, it 1s 
guaranteed by the Density theorem to contain.! If (a,, b,) # (a,, 5,), we have 
(425 be) O (ay,by) = 0, so that g, # g,. The association between the intervals (a, 
b,) and the numbers q, is thus one-to-one (it 1s certainly not onto, but this doesn't 
matter). Since {%} © Q, the set {g,} is countable, and consequently so is the 
collection {(a,, b,)}. = 


The final step of this proof can be modified to show that any collection of 
mutually disjoint open sets in the real line is countable. We can sometimes use 
Theorem 8.11 to establish results for open sets by first proving them for open 
intervals, which is often an easier task (see Exercise 8.4.3). 


EXERCISES 8.4 


1. (a) Show that the sets A, and B, in the proof of Theorem 8.11 are not empty. 


(b) The third paragraph of the proof of Theorem 8.11 would have gone more 
quickly if we had said "since %» <¥ <0, ¥€95." But this would not be 
correct. Why? 


2. (a) If the set S in Theorem 8.11 is unbounded, is the collection of open 


intervals produced necessarily infinite? 


(b) If the set S in Theorem 8.11 is bounded, is the collection of open intervals 
produced necessarily finite? 


. (a) Show that any open interval is a countable union of closed intervals (they 
need not be disjoint). 


(b) Show that any open set is a countable union of closed intervals. 


(c) Is it possible to represent an open interval as a union of countably many 
disjoint closed intervals? 


(d) What if we allow the collection of closed intervals in (c) to be 
uncountable? (Look for any easy answer.) 


(e) What if we allow the collection of closed intervals in (c) to be 
uncountable, but require that each of them have positive length? (See 
Exercise 13.2.4 after you have considered this.) 


. (a) Adjust the end of the proof of Theorem 8.11 to show that any collection 
of mutually disjoint open subsets of the real line is countable. 


(b) Show that this is not true for closed sets. 


. Suppose that (X, T) is a topological space having a countable, dense subset. 
Show that any collection of mutually disjoint, open subsets of X is countable. 
(The meaning of "dense" in this context is defined in Exercise 8.2.2.) 


. A topology is called first countable if it has the following property: For each 
point x in the topological space, there is a countable collection of open sets 
{Ux}, each containing x and such that if U is any open set containing x, there 
is ann so that UF CU. The collection {Ux} is called a local neighborhood 
base at x. Notice that the collection {U:} probably changes from point to 
point. A topology is called second countable if there is a single countable 
collection {/"} such that, if x is in the space and U is an open set containing 
x, then there is an n so that © € Un CU, Such a collection {U,,} is called a 
base (or basis) for the topology. Note that a basis need not be countable. 


(a) Show that the standard topology on the real line is first countable. 
(b) Show that the standard topology on the real line is second countable. 


(c) Show that any second countable topology is also first countable. 


(d) Write the definitions of first and second countable in standard symbolic 
form. 


(e) Write the negations of these two definitions in standard symbolic form. 


(f) Say (in words) what it means for a topological space to be not first 
countable and to be not second countable. 

(g) If {U,} is a basis for a topology 7, show that every element of 7 can be 
written as a union of sets U,. 


(h) If (X, T) is a topological space and {V,} is a collection of subsets of X 


with the property that every element of 7 can be written as a union of sets, 
each of which is the intersection of finitely many of the sets V,, then V, is 


called a subbasis for 7. Show that the collection of open rays is a subbasis 
for the standard topology on the real line. 


. This is more a project than an exercise. We will build a very strange object 
called the Cantor set. Let Cy = [0, 1], and S, = (1/3, 2/3) (we refer to S; as 


the "open middle third" of Co). Let C, = Cp\S,. Then C, consists of two 
closed intervals. Now each of these intervals also has an open middle third. 
Let 52 = (1/9, 2/9) U (7/9,8/9)—_the two open middle thirds of C,—and let C, 
= C,\S5. Then C, consists of four closed intervals. Remove the four open 
middle thirds of C, to obtain C3, and so on. Note that Cn 2 Cn+: for all n. Let 
C=1),Cn. 

(a) Show that C is closed. 

(b) Show that if *.¥ © C and x < y, there is a number 2 ¢ © with x < z < y. 
(We say "C contains no nontrivial interval." This and part (a) show that the 
analogue of Theorem 8.11 for closed sets fails in a big way.) 

(c) Compute the sum of the lengths of the intervals removed from Co in the 
construction of C. (You have to remember some calculus to do this. Hint: The 
answer is 1.) 


(d) What is the length of C? We don't know a precise definition of "length" 
yet, but it might be reasonable to assume that the length of C is: 1— (the sum 
of the lengths of the intervals removed to make C). 

(ec) Show that C consists of all elements of Cg whose ternary expansion 
contains no Is (see Exercise 6.3.1). 


(f) Show that C is uncountable. [This and part (d) show that any connection 
between length and cardinality is a mystery. ] 


(g) Clearly the endpoints of the intervals making up the C,,'s are elements of 


C. But the set of these endpoints is countable (why?), and so it can't be all of 
C. Find one element of C that is not one of these endpoints. 


(h) Repeat this construction, but remove "open middle fourths" (or fifths, or 
whatever you wish). State and prove analogues of each part of the problem 
for this set. 


(i) Modify this construction so that the length of the resulting set is not 0. 
(You will need to remove pieces at each stage that are different fractions of 
the length of the intervals remaining in the set.) This is called a "fat" Cantor 
set. It provides examples in many areas of analysis. How many of the other 
results in this exercise do hold for your fat Cantor set? (Compare this with 
Exercise 6.3.1.c.) 


(j) Construct a one-to-one correspondence between C and Co. (Note that 


elements of C have ternary expansions like 0.2202220..., while elements of 
Cy have binary expansions like 0.1101110....) 


(k) Since C is closed, it contains all its cluster points (that is, C’ CC). Show 
that every point of C is a cluster point of C (so C’ = C). Such a set is called 
perfect. 


(1) Show that every nonempty perfect set is uncountable. 
8.5 FUNCTIONS—DIRECT AND INVERSE IMAGES 


One important reason for thinking about topology is its usefulness in the study of 
functions. Before we get to the main issue—continuity—we must learn to look 
at functions from just the right point of view. We usually think of functions in 
terms of plugging in x and getting out y. Can we plug a set into a function? If f(x) 
= x7, does f({-1, 0, 1, 2}) mean anything? One way we might interpret this is 
simply to plug the elements of the set into the function one at a time. In this case, 
f({-1, 0, 1, 2}) = {0, 1, 4}. 


DEFINITION 8.12: Let f: 4 — B, and 5 © A. The direct image of S under f is 
given by f(S)={ye B: sare S3(y = f(z))}. 


This is illustrated in the following diagram. The set of images, under the 


function f, of the elements in the shaded area on the left might be the shaded 
region on the right. 


f(x) 


If f: A — B, the direct image of the entire domain, f(A), is sometimes called the 
range of /- This might cause some confusion since when we have also referred 
to B as the range of fin this situation, even if not every element of B is an output 
of f, To avoid using the same word for two (possibly different) things, the set B 
in the expression f: A — B is sometimes called the codomain of / We will write 
f(A) for the range of f whenever it is important to distinguish it from the 
codomain (which will not be very often). Observe that any function f: A — f(A) 
is onto; in fact, "B = f(A)" may be taken as the definition of "f: A — B is onto." 

If f : A — B 1s a one-to-one correspondence, it has an inverse function, 
denoted f!, whose domain is B and whose range is A and which is defined by: x 
= f(y) © y = f(x). This is the same as saying ff !(y)) = y and f!(f(x)) = x. (We 
often use y for elements of B and x for elements of A, but this is only a 
convenience. We can tell from context whether an element being inserted into a 
function is in A or B.) If f is one-to-one, it is a one-to-one correspondence 
between A and f(A), and we can always define an inverse function f-! : (A) > A. 
Now if f: A > B is one-to-one and § & f(A) © %, the direct image of S under f! 
is given by 


f-(S) ={2 € A: 2 = f7'(y) for some y € S}. 
If we replace "x =f !(y)" with "y = f(x)" we find 
f—1(S) = {2 € A: y= f(z) for some y € S}. 
Now "y = f(x) for some ¥ © 5” may be abbreviated “/() © 5," and so 
f(S) ={2E A: f(z) € S}. 


Not only is the last version of this statement easier to look at than the original, 


we don't have to evaluate an inverse function to use it. In fact, the last 
formulation makes sense even if f doesn't have an inverse function (that is, if fis 
not one-to-one) and even if Sis not a subset of f(A). Remember that both of these 


conditions must hold before we can even consider the direct image of S under / 
-] 


DEFINITION 8.13: Let f: 4 — B and 5 © B. The inverse image of S under 
is given by f-'(S) = {2 € A: f(x) € Sf, 

This definition is best remembered: * © f-‘(S) = f() © 5. The diagram below 
illustrates the inverse function of f: 


While the next diagram illustrates the construction of an inverse image. 


an ~ © f(z) 


oe) J f(z) 


r : . -— @ 


In the second diagram x is an clement of the inverse image of the shaded region 
on the right, while z is not. The most important thing to note here is that this 
diagram resembles the one describing the direct image more than it does the one 
describing the inverse function (in particular, both the arrows point to the right). 
The inverse function doesn't appear in the picture at all. 

Inverse images can always be found, and so they are much more useful to us 
than direct images under inverse functions. But now we are using the notation / 
~!(S) to stand for two slightly different things: The direct image of S under the 


function f! and the inverse image of S under the function f, You will show in 
Exercise 8.5.2 that this confusion, though real, is not as dangerous as it might 


seem. It does, however, influence how we do proofs. The notation f'(S) will 


always mean the inverse image of S under f. Unless we say otherwise, we never 
assume that a function has an inverse. 


EXAMPLES 8.5: 1. Let f(x) = x? and A = {0, 4}. Then f!(A) = {0, —2, 2} and f 
~! ({-1}) = 0. Note that the inverse image of a set need not have the same 
cardinality as the set. 


2. For any function f: A > B, we have f!(0) = Hand f '(B) = A. This is different 
from saying B = f(A), which is not always true. We can't simply apply f to both 
sides of such an equality since it is not always the case that ff !(S)) = S. You 
will examine this issue in Exercise 8.5.4. 


3. Let f(x) = sin(x). Then f'(0) is undefined since f is not one-to-one and so has 
no inverse function. On the other hand, f-!({0}) = {0, +2, +2z, ...}. (Remember 
that 0 and {0} are very different things.) 


We have found two new ways of manipulating sets (direct and inverse images), 
and we should consider the relationships between these processes and the usual 
set operations. We will establish two results here. This theorem suggests that 
inverse images are, in a sense, "better behaved" than direct images, a comforting 
thought when first encountering a new concept. 


THEOREM 8.14: Suppose f: A — B. 
(a) Ife, DCA then f(CND) SCS F(CHNF( D) 
(b) If E.F CB then f-' (ENF) = f-(E)N fF), 


PROOF: (a) Let ¥ © /(C). By the definition of direct image, there is an 
element x of C9 D with y = f(x). Then x € C and « € D. It follows that ¥ © (©) 
(b) There are two containment proofs to be done. We will use the forward- 
backward method to show that f “(29 F)C f-'(B)O f-'(F), 


We insert definitions: 


—, Then f(t) €E ENF. 


* * * 


c < Teheheke) oCuyvVeSYS a hehehehe hehcheLekehelohetetehehehehehehe 


Then f(z) © ENF. 
— f(r)€ Eand t€ f-)(F). 


We may finish the proof with another, similar set argument, or we may observe 
that each connective in the above argument is "if and only if" (since each step is 
a definition). This means that the entire argument can simply be reversed (it is 
very rare that we can do one part of an if-and-only-if proof by simply reversing 
the other part, and we can never assume it will work). = 


The relationship between inverse and direct images and algebraic operations on 
functions is very subtle. Composition, however, works as we would expect. 


THEOREM 8.15: ff: 4 > B, g: B > C, and S©C, then (¢. fy (8) = f (eg 
=] 
(S)). 


PROOF: We will do half of this proof, leaving the rest as Exercise 8.5.6. This is 
a good example of how looking at just the right level of detail can make a 
problem easier. Let # © (9° f) '(S), By the definition of composition, 
(9° f)(t) = g(f{x)) © S, Since g(something) € S, it must be that (something) 
Eg \(S), that is, /(*) © 9 (5S). Now we have (something else) € (some set), and 
so it must be that (something else) < f-!(some set). That is, * © /~‘(97'(5)), as 
desired. = 


We have consistently used the observation “© © f-(S) = f(x) © S” but no 
references to inverse functions. Unless we know that f has an inverse function 


(and we almost never will), we can't write expressions like f'(y), where y is an 
element of the range of f This is our final warning on the subject. 


EXERCISES 8.5 


1. Verify that the definition of inverse function "... is the same as saying /(/ 


1) =y and f(x) =x." 


2. If f is one-to-one and onto, show that the two interpretations of the 
expression "f!(S)" are the same. First give them different names, say U = {x 


:x =f \(y) for some y € S$} (the direct image under f!) and Y= {®: f(*) © 5} 
(the inverse image under /). Now show that U = /. 


3. (a) If f: A — B is one-to-one and onto, show that f! is a function. 


(b) Show that the given definition of f! doesn't yield a function if f is not 
both one-to-one and onto. 


4. Show: (a) (CUD) = (C)U FD), 
(b) F(C)\F(D) € F(E\D)., 
(c) f- (CUD) = f-(C)U f(D). 
(d) f \(C\D) = f (CD). (Note again that inverse images are better 
behaved than direct images.) 


(e) The containment in (b) might be strict (that is, the sets might not be 
equal). 


(f) The containment in Theorem 8.14.a might be strict. 
(g) [(f-*(C)) S ©, and this containment might be strict. 
(hy © © f-*(f(C)), and this containment might be strict. 
(1) If fis one-to-one, show that the sets in (b) are equal. 
(j) If the sets in (b) are always equal, is fnecessarily one-to-one? 


(k) Are there conditions that can be put on the functions or sets in parts (g) 
and (h) of this problem and in Theorem 8.14.a that will guarantee that the 
sets involved are equal? 


5. (a) Examine the direct and inverse images of open and closed intervals under 


the functions f(x) = x° and g(x) = x”. 

(b) Repeat part (a) for functions x? and x’, where p is even and q is odd. 

(c) Repeat part (a) for the function f(x) = x7 + x or any simple polynomial of 
your choosing. Develop a conjecture about inverse images of intervals under 
polynomials. 


6. Complete the proof of Theorem 8.15. 


7. (a) Suppose f: A — B and that 5 S f(A). Show that the cardinality of f'(S) is 
not less than the cardinality of S. 


(b) Show that a function is one-to-one if and only if it has the property that 
the inverse image of any set with one element has at most one element. 


(c) Suppose a function has the property that the inverse image of any set with 
two elements is either empty or has precisely two elements. Is such a function 
one-to-one? 


(d) Suppose a function has the property that the inverse image of any 
countable set is countable and the inverse image of any uncountable set is 
uncountable. Is such a function one-to-one? 


8.6 CONTINUOUS FUNCTIONS 


The ¢-d definition of continuity is familiar from calculus: 


fis continuous at a if for every € > 0, there isad>0 
so that \fix) — fla)| < ¢ whenever |x — a| <0. 


Here is a picture that illustrates the ¢-6 definition of continuity: 


fla)+e 


f(a) 4 : \ 
f(a)—eé -+ fees oe ae ee = SS ae 


We wish to find a value of 6 so that the portion of the graph above the interval (a 
— 6, a + 0) lies between the two dashed horizontal lines. If this is possible for 
every ¢ > 0, the function is continuous at a. It would be in keeping with the 
approach we have adopted to have a purely topological characterization of 
continuity. Expressions like "|x — a| < 6," with its reference to distance (and 
implicit reference to ordering) are the sort of thing we try to avoid by looking at 
things topologically. Observe what happens if we consider all the points on the 
graph that lie between the two dashed horizontal lines, and all points on the x- 
axis that are sent there by /: 


fla)+e 


f(a) + / \ WA 
f(a) -—eé -} -- spits leone 


| a 


The definition of continuity requires there to be an interval (a — 6, a + 0) 
contained in the set we have constructed on the x-axis. This means that this set 
must be a neighborhood of a. Notice, too, that the set we have constructed on the 
X-axis is just the inverse image of (f(a) — ¢, fa) + e) under f, Moreover, if we had 
begun this process with a neighborhood of f(a) (rather than an ¢é-interval), we 
would have arrived at the same conclusion. Such a neighborhood would contain 
an ¢-interval, and the inverse image of that neighborhood would in turn contain 
the inverse image of the ¢-interval. Putting all of this together, we see that the 
definition of continuity of a function fat a point a may be phrased like this: 


fis continuous at a if f-'(U) is a neighborhood of a 
whenever U is a neighborhood of f(a). 


If the function fis continuous everywhere (for the time being, we are interested 
only in functions whose domains are the whole real line), this condition must 
hold for all a. In other words, if we begin with a set on the y-axis that is a 
neighborhood of each of its points, its inverse image under f must also be a 
neighborhood of each of its points. 


DEFINITION 8.16: A function f: R — R is continuous if f!(S) is open 
whenever S is open. (We say "Inverse images of open sets are open.") 


This definition fulfills our desire for something "purely topological," but the 
property being defined here is not quite the same as the one in the ¢-o definition. 
(Be sure you see the difference.) We will reconcile the two definitions in 
Theorem 8.18. To waylay suspicions that we are being led down a garden path, 
we will do a short, simple proof. You are invited to supply the é-0 version. 


THEOREM 8.17: Compositions of continuous functions are continuous. 


PROOF: Let /: R — R and g: R — R both be continuous. Consider the inverse 
image of an open set, say S, under the composition g . f, By Theorem 8.15, (g « /f) 
lS) = f(g \(S)). Since g is continuous, g !(S) is open. Since fis continuous, f 
~!(g!($)) is open. = 


The brevity of this proof goes a long way toward explaining why we look at 
continuity the way we do. There are negative aspects to this. The proof that the 
sum of two continuous functions is continuous is not so easy in these terms. An 
é-0 proof is more appropriate in that case. 


The use of "whenever" as a quantifier might obscure the logical structure of 
the definition of continuity (in either form). Some care about this now will make 
things easier for us later. "Whenever" is a universal quantifier, and the object 
quantified is the set S. We may write the definition: 


VS C R(S is open > f 1(S) is open). 


This makes the definition easy to negate. A function is not continuous if 


4S C R3(S is open and f~!(S) is not open) 


The problem of showing that a function is not continuous is one of finding an 
example of a set with certain properties. 


EXAMPLES 8.6: 1. Let f: R — R be constant, say f(x) = c for all x. Then fis 
continuous: Let S be an open subset of R. If ¢ € S, then f!(S) = R. On the other 
hand, if © ¢ 5, then f'(S) = 0 In either case, f'(S) is open, and so f/f is 
continuous. (Note how neatly the definition of a topology fits with this proof.) 


2. Let f(x) =x. For any set S, f-'(S) = S, if S is open (considered as a subset of the 
range) then S is open (considered as a subset of the domain), and so f is 
continuous. 


3. Let f(x) = 1 if x > 0 and f(x) = 0 if x < 0. As we suspect, fis not continuous. 
Let S = (-1/2, 1/2), which is open. Then f!(S) = (00, 0], which is not open. 


We will now examine the connection between the ¢-6 definition of continuity 
and our definition. The ¢-6 definition describes continuity at a point, while ours 
deals with the whole real line. Recall that a function is continuous on a set 
(according to the ¢-d definition) if it is continuous at each point of the set. Let us 
say that a function is "continuous(e)" if it satisfies the e-d definition at each point 
of the real line, and "continuous(7)" if it satisfies the definition of continuity we 
have given here. 


THEOREM 8.18: 4 function f: R — R is continuous(e) if and only if it is 
continuous(T ). 


We will show that if fis continuous(e) then it is continuous(7). The other half of 
the proof is Exercise 8.6.2. Our proof will test the limits of the forward- 
backward method. After establishing the result in this way, we will also present a 
"streamlined" version of the proof distilled from the forward-backward version 
with all the grace that hindsight can provide. Which is the better proof? The 
feeling that elegance is the major portion of quality seems to indicate that the 
second one is much better. This feeling is practically the definition of 
mathematics. On the other hand, one might argue that the "better" proof is the 
one we can find ourselves. This is a personal, aesthetic decision. 


Before we begin, we observe that the ¢-6 definition of "fis continuous at x" 
may be written in the following way: 


fis continuous at x if, for any € > 0, there is ad >0 so that 
y € (x —6,23 +46) => fly) € (f(z) —e, f(r) +6). 


PROOF 1: We wish to establish that f is continuous(7). The definition of 
continuous(T ) is universally quantified, and so the proof must begin and end like 
this: 


* * * 


Bue ie oN Is open. 


We will do a few backward steps. First, we insert the definition of "open," as it 


relates to f!(S): 


Let S$ ee an open set. 


——» f1(S) is a neighborhood of each of its points. 
ue iss Ob 18 open. 


The statement we have just inserted is also universally quantified. To prove it, 


we must select an element of f'(S) and show that f1(S) is a neighborhood of 
that particular element: 


Let $ ‘ei an aie et 
— Let € f- 


—» f1(S) is a neighborhood of x. 
f \(S) is a neighborhood of each of its points. 


es SO ie 1S open. 


Now we will make a forward step and a backward step. We may insert the 
definitions of both “* € f-'(S)” and "f1(S) is a neighborhood of x." At this 
point, we will use a bit of foresight. The main tool we have available—the 
definition of continuous(e)—will eventually give us a number that measures a 
distance in the domain of f That number will be called 6. The set f-!(S) is also in 
the domain of f We take this into account when we choose a name for the 
number we introduce in our backward step. 


Let $ be an open set. 
Let x € f~*(S), 
>f(zyEeS. 


— 36 > 03((z — 6,2 +6) C fo 
f \(S) is a neighborhood of x. 
f \(S) is a neighborhood of each of its points. 


Then - (S)i iS 3 open. 


ee eee ae ee ee eee eee ee ee ee ee ee ee ee eee ee ee eee 


Let $ Sbea an open set. 
Let  € f-'(S).. 
f(z) €S. 
— Sis a neighborhood of f(x). 
46 > 03((x — 6,2 +6) C f7*(S)). 
f \(S) is a neighborhood of x. 
f \(S) is a neighborhood of each of its points. 


Then f" (S) is open. 


ee eS ee ae ee eee ee Fe ee ee Oe Ore eee, oe, ee, ee ee, ee ae, 


.. and the definition of neighborhood [again we think of continuous(e), which 
will require a distance called ¢ in the range of the function] ... 


Let $ Sie! an open set. 


Let z= € i 1(S). 

f(z) € S. 

Sis a neighborhood of f(x). 

os de > 039((f(xr) —¢, f(x) +e) CS). 


36 > 03((z — 6,4 +6) C f-!(S)). 
f \(S) is a neighborhood of x. 
f \(S) is a neighborhood of each of its points. 


Then f' (S)i iS 3 open. 


ee ee ee ee Oe ae ee ee ee ee ee eee ee ee ee a ee ae ee a ee ee ee a ee ee ee ee ee ee ee oe ee ee 


We insert the definition of "subset": 


Let S$ at an open set. 


Let t € f-"(5)., 
f(z) €S. 
Sis a neighborhood of f(x). 
de > O3((f(z) —e, f(x) +e) CS). 
+> de > OSV2(z € (f(x) —e, f(z) +e) > z ES). 


46 > 03((x — 6,2 +6) C f-"(S)). 
f \(S) is a neighborhood of x. 
f \(S) is a neighborhood of each of its points. 


Then ss (S) is open. 


eee a ee ee ae ee ee ee ee ee ee ee oe ee a ee ee ee ee ee er a ae ee, ee 


Now we come to the hardest step of the proof. Consider the last line we have 
reached by forward steps and look back at the statement of the ¢-6 definition just 
before the proof. Using this definition, we can force f(y) to be an element of (f(x) 
—€, f(x) + €) by putting y in just the right place [and such an f(y) would be one of 
the z's in the last forward statement in the proof]. Since this can be done for any 
é, it can be done for the particular one we have at hand. 


Let S Shee an open set. 

Let t € f~"(S)., 

f(x) ES 

S is a neighborhood of f(x). 

de > O3((f(r) —e, f(x) +e) CS). 

de > O5V2(z € (f(x) —e, f(z) +e) > z € S). 

—+ 46 > 03(y € (x — 6,4 +6) > f(y) € (F(z) —e, f(x) + €)). 
36 > 03((z — 6,2 + 8) C f—1(S)). 

f \(S) is a neighborhood of x. 

f \(S) is a neighborhood of each of its points. 


Then a ()i iS open. 


ee ae Gr et al ae eee ee ae ee ee ee a ee ae ae ee ae ee ee ae Oe a ee ae ae eae ee ee ae ee ee” eae ee 


Let S be an open set. 
Let = € f7"(S).. 
f(zjeEes 
S is a neighborhood of f(x). 
e > 03((f(z) -—e, f(z) +e) C S). 
de > OdVz(z € (f(z) —e, f(z) +2) > z al 
465 > O3(y € (x — 6,2 + 5) => f(y) € (f(x) —e, f(x) +)). 
dd > O3(y Ee (x -—6,7 +4) > fly)ES 


* * * 


36 > 03((z — 6,2 +6) € f—+(S)). 


f \(S) is a neighborhood of x. 
f '(S) is a neighborhood of each of its points. 


Then Tc (S)i iS open. 


ee ee ee eee ee ee ee ee eee See eee eee. ee ee. ee Y 


Inserting the definition of inverse image and noting that the line so formed is the 
definition of "subset," we are done: 


Lviv hvhrheh+ hehehehehe hehehehehe hehehehehe hehehehehe hehehehehe hehehehehe hehehehehe heed 


Let S be an open set. 

Let z € f- 1(S i. 

f(z) € S. 

S is a neighborhood of f(x). 

de > O3((f(x) —e, f(z) +e) CS). 

de > O5V2(z € (f(x) —e, f(z) +e) > z ES). 

46 > O3(y € (x — 6,2 + 6) > fly) € (f(x) —¢, f(z) +)). 
36 > O3(y € (x — 6,0 +6) > fly) € S). 

— 36 > 03(y € (x# —6,4 + 6) > ye f71(S)). 

36 > 03((2 — 6,2 + 6) C f-1(S)). 

f \(S) is a neighborhood of x. 

f \(S) is a neighborhood of each of its points. 


Then f !(S) is open. = 


we a ee ae ee ee ee ee ee ee ee ee ee ee ee ee oe ee eo ee. ea ee ee ee ee 
BhAALLALALALAL4L4AALSL4S4Sh4h4h44h4444444h4h4h44448 


As always, the forward-backward method did not let us avoid the difficult steps 
in the proof, but it made clear where they occurred. This allowed us to focus our 
attention on them, and only when it was necessary. Here is the streamlined 


version of the proof. You should look back and forth between it and the forward- 
backward version to see where the steps come from and why they fit together as 
they do (some of the connections are pretty subtle). 


PROOF 2: Suppose that fis continuous(e). Let S be an open set. Let * © f~'(5) 
and let y = f(x). Since y © 5 and S is open, there is an ¢ > 0 so that 


y-s.yte)oS, Since f is continuous(e), there is a 56 > 0, so_ that 
((x —d6,r+6)) C ( e: e) z€ (xz —6,72+6) 

f((a r+6))C (y ¥Y+®) For each my. we _—ihave 
i p j - '\o-oa - ise : — . l 7a i 

f(z)€(y-e,yt+e)OS, This means (e—-6,2+6)Of""(S) and so f'(S) is a 


neighborhood of x, f!(S) is open, and fis continuous(7). = 


EXERCISES 8.6 


1. Give an ¢-6 proof of Theorem 8.17. 
2. Complete the proof of Theorem 8.18. 


3. (a) If fis continuous, show that inverse images of closed sets are closed. 


(b) Go back and review your answer to Exercise 8.5.5. 


4. Prove or disprove: The function fis continuous(e) at the point a if and only if 
there is a neighborhood U of a such that fis continuous(T7) on U. 


5. Show that the function fis continuous at the point a if and only if, for every ¢ 
> 0, there is a 6 > 0 so that |f(x) — f(y)| < « whenever x and y are both in the 
interval (a — 6, a + 0). 


6. (a) Show that if X has the discrete topology, every function whose domain is 
X is continuous (regardless of the topology on the codomain). The discrete 
topology was defined in Exercise 8.2.3. 


(b) Show that if Y has the indiscrete topology, every function whose 
codomain is Y is continuous (regardless of the topology on the domain) 


(c) Show that constant functions are always continuous. 


(d) Show that if X has the indiscrete topology and Y does not, the only 
functions f: X — Y that are continuous are constants. 


(e) Show that the identity function (f(x) = x) from a topological space to itself 
is always continuous. 


(f) Suppose X, and _X, are equal as sets but have different topologies (and so 


they arc not equal as topological spaces). Show that the identity function /: 
X, — X, need not be continuous. 


(g) What conditions on the topologies on X, and X, in part (f) would 
guarantee that the identity function is continuous? 


8.7 RELATIVE TOPOLOGIES 


Our definition of continuity does not tell us what it means for a function to be 
continuous if its domain is not the whole real line. Just about all of the important 
theorems of calculus concern functions whose domains are closed, bounded 
intervals, and so it is particularly important that we know what it means for such 
functions to be continuous. In view of Definition 8.16, all we really need is to 
define what it means for a subset of an interval to be open. Be warned that the 
following is not standard terminology. 


DEFINITION 8.19: If 7 © 5 oR, we say T is *open in S if there is an open 
subset U of R so thatT = SOU. 


This should be pronounced "star-open." For now, we reserve "open" to mean "an 
open subset of the real line." With a little practice we will be able to distinguish 
the difference from context and will just say "open" in both cases. Notice that 
this definition applies to any subset of the real numbers, not just intervals. The 
collection of “open subsets of a set S is called the relative topology on S or the 
topology S inherits from R. You will show in Exercise 8.7.1 that a relative 
topology is indeed a topology. The definition of continuity for a function defined 
on a subset of the real line is now quite natural: 


DEFINITION 8.20: If S.7 © R, a function f: S > T is continuous if the inverse 
image of any “open set in 7 is a “open set in S. 


If (X, T) is any topological space, we usually refer to the elements of 7 as 
"open." If we follow this convention, this definition (without the «s) makes sense 
for functions between any two topological spaces. 


EXAMPLES 8.7: 1. Each natural number is a “open subset of N. For example, 
{7} = (6.5.7.5) N_ Tn fact, every subset of N is *open. It follows that every 
function whose domain is N is continuous. This is true of any set each of whose 


points is “open. We may interpret this in terms of pointwise continuity (the e-6 
kind) by saying that if « € S is a point such that {x} is a “open subset of S, then 
any function f: S — R is continuous at x. This is a useful observation. ("{x} is a 
“open subset of S" is equivalent to "x is an isolated point of S"—see Exercise 
fa) 


2. Let S = [0, 1]. Then (1/2, 1] is *open in S since (1/2, 1] = (1/2,3)N.S, and (1/2, 
3) is open. Note that (1/2, 1] is not an open subset of R and is not a “open subset 
of [0, 2]. Whether 7’ © 5 is *open depends on both 7 and S. 


3. If f: R — R is continuous, it is also continuous when its domain is restricted 
to [0, 1] (see Exercise 8.7.5). In fact, every continuous function on [0, 1] arises 
in this way. We may just extend such a function to the whole real line by saying 
fix) =f(1) if x > 1 and f(x) = (0) if x < 0. It is a much deeper result that this same 
process can be carried out for any function whose domain is a closed, bounded 
set. This is not true if the domain is an open interval, as the next example shows. 


4. Now let S$ = (0, 1). You will show in Exercise 8.7.2 that a subset of S is *open 
if and only if it is open when considered as a subset of all of R. However, the 
function given by f(x) = 1/x is continuous on S, but it can't be obtained as a 
restriction to S of a continuous function on R. Win some, lose some. Note also 
that the function given by f(x) = 0 for x < 1 and f(x) = 1 for x > 1 is continuous 
when restricted to (0, 1) but not when restricted to [0, 1]. 


5. The function in Example 8.6.3 is continuous on (—, 0] and continuous on (0, 
00), but not on [0, «). 


EXERCISES 8.7 


1. If S © R, show that the “open subsets of S are a topology on S. 


2. (a) If Sis open and 7 © 5, show that T is *open if and only if 7 is open. 
(b) Give an example to show that this may not be so if S is not open. 


(c) If 7 © Sand Tis open, show that T is *open no matter what S is. 


3. (a) Suppose ¥ © X and fis continuous on X. Show that fis continuous on Y. 
(b) Show that fcan be continuous on Y but not continuous on X. 


(c) If fis continuous on AU B, it is continuous on A and on B. 


(d) This is true for the union of any number of sets. 
(e) This is not always true for intersections. 
(f) If fis continuous on 4 and on B, it is continuous on AN B. 


(g) This is not always true for unions. 


. (a) Suppose x € S is an isolated point of S. Show that {x} is *open in S. 
(b) With S and X as in (a), show that every function defined on S' is 
continuous(e) at x. 


(c) If S is a set with no cluster points, show that every function defined on S$ 
is continuous. 


. (a) Suppose f: [a, b] — R is continuous. Show that the function g defined by 


f(a) r<a 
g(x) = f(z) a<a<b 
tid). = >b 


is continuous on the whole real line. (This is called an extension of / While 
there is only one way to restrict a function to a domain smaller than its 
original one, there are many ways to extend one.) 


(b) If the domain of the function in (a) is an open interval, show that it might 
not be possible to extend it to a continuous function on the whole real line. 


(c) Suppose f : R — R is continuous and that B = f(R). Show that / 
considered as a function f: R — B is continuous. 


. (a) Show that the topology that N inherits from R is the same as the discrete 
topology on N. 


(b) If 5 © Rand every point of S is an isolated point, show that the topology 
that S inherits from R is the discrete topology. 


(c) The definition of relative topology applies to any set that is a subset of a 
topological space. Show that if S © X and_X has the discrete topology, then S 
inherits the discrete topology, and if X has the indiscrete topology, then S 
inherits the indiscrete topology. 


(d) Suppose 5 & X, where _X is a topological space. Give examples to show 
that S can inherit the indiscrete topology even if X does not have the 
indiscrete topology, and that S can inherit the discrete topology even if X 


does not have the discrete topology. 


! The Density theorem guarantees the existence of such a rational number, but the nonconstructive 
nature of this statement may be disturbing. We may proceed this way: The rational numbers can be 
enumerated: gj, q2, .... Match the interval (a, by) with the first number in this list that it contains. 


2 The domain of jf must be a neighborhood of a for this to make sense. If we wish to eliminate this 
requirement, we could change "|x — a| < 6" to "|x — al < 6 and x is in the domain of f" We will deal with 
this problem more gracefully later in the chapter. For now we will just assume that the domain of fis a 
neighborhood of a when it is convenient to do so. 


Chapter 9 


Sequences 
9.1 AN APPROXIMATION PROBLEM 


Here is a process for finding an approximation to V2. Let f(x) = x* — 2. We will 
make a succession of guesses to the solution of the equation f(x) = 0. To find our 
first guess, we observe that f is continuous, f(1) < 0, and f(2) > 0. By the 
Intermediate Value theorem,! there is a number a between 1 and 2 with f(a) = 0. 
We now know that our problem has a solution, and roughly where it is. At each 
stage of this process we will have both a guess for the solution and a current 
interval in which the solution is known to lie. Let's take x, = 1 (the left endpoint 
of the current interval) as our first guess. This is not the solution to the equation, 
but it is at most one unit away from the solution. (We suspect it is closer than 
that, but with the information we have so far this is the best we can guarantee.) If 
we're happy with a guess that could be off by one unit, we're done. But we can 
do better. At x = 3/2 (the center of [1, 2]), f(x) is positive. By the Intermediate 
Value theorem again, we know there is a solution to f(x) = 0 between 1 and 3/2. 
As before, we may use the left endpoint of our current interval as our estimate, 
making x, = 1 also our second guess. We know more about it now, though: We 


know it's at most 1/2 unit from the answer. 

We can keep this up as long as we like. The center of [1, 3/2] is 5/4. The 
function is negative at 5/4 and positive at 3/2, and so our next guess is x3 = 5/4, 
which is off by at most 1/4. We might hit the solution exactly at some stage, or 
we might go on indefinitely. If we don't find the exact answer, just what do we 
get out of this process? We get a collection of guesses: x), x5, ..., each 
accompanied by an estimate of how it differs from the answer. Each of these 
error estimates is smaller than the previous one (even though the difference 
between the guesses and the answer may not actually decrease with each step). 
We can say this about our "solution": 


If we say beforehand how much error we can tolerate, there is a stage in this 
process where we can guarantee that the desired accuracy is achieved for that 
and all subsequent estimates. 


This is the essence of the theory of sequences. We need only make these 
observations precise. Notice that in our example we have created a function that 
takes each natural number 7 to our guess x,. 


DEFINITION 9.1: A real sequence is a function x: N— R. 


"Real" in this definition refers to the range of the function. One can consider 
"rational" sequences or "complex" sequences. It is customary to use x, y, or Z, 
rather than f, g, or h, to name sequences, and to denote the value of x at n by x,, 
rather than x(n) (which would be more correct). If we wish to refer to a whole 
sequence, we write (x,,) or X. This general abuse of notation shouldn't cause any 


confusion. It is important, though, to distinguish a sequence (x,,) from its range 
{x,}. The parentheses denoting the sequence indicate that the order in which the 


terms appear is important. We can refer to the "third term" of a sequence, but not 
to the "third element" of a set. 


EXERCISES 9.1 


1. Show that the solution to the equation in the example that opens the section 
is never found exactly. 


2. (a) The process described above called the bisection method. Continue it 
until you have an approximation to V2 that you can guarantee is accurate to 
two decimal places. 


(b) After the first guess and the first interval are found in the bisection 
method, one can say exactly how many steps will be necessary before the 
desired accuracy can be guaranteed. Explain. 


(c) Use the bisection method to find an approximate solution to the equation 
cos x = x that is accurate to two decimal places (you will need a calculator). 


(d) Show that the bisection method never results in a guess that is less than 
the previous one. 


(e) The biscetion method resembles the procedure developed in Chapter 6 for 
finding the decimal expansion of a number. Discuss the benfits and 
drawbacks of dividing the working interval into ten parts instead of two at 
each stage of this process. 


9.2 CONVERGENCE 


We now make precise our comments about the approximating procedure at the 
beginning of the chapter. 


DEFINITION 9.2: (a) A sequence (x,,) is said to converge to L if, for every ¢ > 
0, there is a number JN, so that |x, — L| < ¢ whenever n > N,,.. If this is the case, we 
say L is the limit of (x,,) and write lim x, = L. (This is usually called the "s-N 
version" of the definition.) 


(b) A sequence (x,) is said to converge if there is a number L so that (x,) 
converges to L. A sequence that does not converge is said to diverge. 


EXAMPLES 9.2: 1. Let x, = I/n and let ¢ > 0 be given. By Corollary 6.2.b, 
there is a natural number JN, so that 1/N, < «. Ifn > N,, we have |1/n — 0| = I/n < 
1/N, < e. Thus lim 1/n = 0. Note that we had to guess that the limit is 0 in order 
to begin. 


2. In the example at the beginning of the chapter, the difference between x, and 


v2 is no more than 1/2”. Since n < 2”, we have (by choosing N, as in the 


< 1/2" < 1/n<¢ 1/N, = 


; ” /9 any {9 
previous example), Ign — V2| ~©, and so v2, as we suspected. 


NOTES: 1. Z must be a number. A sequence can't converge to ©. 


2. It is reasonable to assume that N, depends on «. If we make our "error 


tolerance" smaller, we expect to be required go further along in the process to 
achieve it. You will examine the relationship between ¢ and N, more closely in 


Exercise 9.2.1. The dependence of N, on ¢ is usually suppressed in notation, and 
we just write N. 


3. We don't write lim, ,.. x, = L simply because the limiting process in a 


sequence is always the same (m — 0). It can be useful to think of the problem as 
a "limit at infinity," though, since we have seen these in calculus. 


To bring our study of sequences in line with the rest of our work, we should find 
a topological characterization of convergence. Note that |x, — L| < ¢ if and only if 
(L~©,L +e) and that (L — ¢, L + e) is a neighborhood of L. This suggests the 
following: 


ALTERNATE DEFINITION 9.3: (x,,) converges to L if, for each neighborhood 
V of L, there is a number Ny so that £n © V whenever n > Ny. 


The phrase "the sequence does so and so for n > N" turns up often enough to 
deserve a name: 


DEFINITION 9.4: (a) A sequence (x,,) is eventually in the set S if there is a 
number WN so that “n © S whenever n > N. 

(b) A sequence (x,,) is frequently in a set S if for any natural number N, there is 
ann > N for which ?n € 5. 


We also may use phrases such as "eventually positive" and "frequently less than 
10." It is important to remember that "eventually" and "frequently" now have 
precise meanings, which may or may not coincide with everyday usage. We may 
state the definition of convergence more succinctly: 


FINAL DEFINITION 9.5: The sequence (x,) converges to L if it is eventually 
in any neighborhood of L. 


EXAMPLES 9.2: 3. Let x, = (—1)”. Then (x,,) does not converge. Let z be any 
real number and V = (z — 1, z+ 1). By Theorem 4.19, if (x,,) is eventually in V, it 
must eventually have the property that, for any m and n, |x, — x,,| < 2. The 
sequence (x,,) doesn't have this property, and so it can't converge to any number. 


Notice that we could have picked any ¢ < 2 and let V = (z — €/2, z + é€/2). We see 
that if a sequence does converge to some number z, and ¢ > 0 is given, it must 
eventually be the case that |x, — x,,| < €¢ (curiously, the last statement makes no 


reference to z). This is the basis for the deepest part of the theory of convergent 
sequences and may be stated roughly like this: 


If the terms in a sequence get close to a limit, they must get close to each other. 


We will examine the ramifications of this in the next chapter. The numbers (—1)” 
don't get close to each other, and so the sequence can't have a limit. 


EXERCISES 9.2 


. (a) Suppose that (x,), ¢ and N, are as in Definition 9.2 and that 6 > ¢. Show 
that it is also the case that |x, — L| < 6 whenever n > N,,.. 


(b) Explain why this means that "JV, gets bigger as ¢ gets smaller." 


(c) Is the statement in (b) strictly true? In other words, is it really true that 6 
>soNn 5) <N et 


(d) With (x,,) and ¢ as in Definition 9.2, let 
v(e) = min{N : |x, — L| < ¢ whenever n > N}. 
Show that 6 > ¢ = v(0) < v(e). 


. Give examples of a sequence of rational numbers whose limit is irrational 
and a sequence of irrational numbers whose limit is rational. 


. (a) Interpret the phrase "eventually = frequently" and prove it. 
(b) Show that "frequently" does not imply "eventually." 


(c) Show that a sequence is eventually in a set if there are only finitely many 
values of n for which it is not in the set. 


(d) Show that a sequence is frequently in a set if it is in the set for infinitely 
many values of 7. 


. (a) Suppose f: R — R and that lim,_,., f(x) = L (in the usual calculus sense). 
Show that lim f(z) = L. 

(b) Show that it is possible to have lim f(z) = L even if lim,_,., f(x) doesn't 
exist. 

(c) Give a condition on f(x) that will guarantee that lim f(7) = L implies 
lim,_,.. f(x) = L. 


im <A ace 
. (a) Guess the value of |" x?+7 and use the ¢-N definition to show that your 


guess is correct. 


(b) Repeat part (a) for !™ a. 


. (a) Show that lim x, = L if and only if the sequence described by x, L, x5, L, 
x3, L, ..., converges. 


(b) Show that lim x, = lim y, if and only if the sequence described by x), y,, 
Xy, V7, -.. converges. 


7. Show that a sequence can't be eventually in both of two disjoint sets. 


8. (a) Prove that if there is an ¢ > 0 so that (x,,) is not eventually in any interval 
of length «, then (x,,) diverges. 
(b) Use this to prove (yet again) that ((—1)”) diverges. 
(c) Carefully state the converse of (a) and either prove or disprove it. 


. Ei +I2t+---+Iy 
9. Given a sequence (x,,), define a sequence (a,,) by “» ~ n 


(a) If lim x, =L, show that lim a, = L. 


(b) Give an example of a sequence x,, where lim a, exists but lim x, does not. 


9.3 CONVERGENT SEQUENCES 


We now establish some basic results describing the behavior of convergent 
sequences and the relationship between such sequences and their limits. 


THEOREM 9.6: 4 sequence can have at most limit. 


PROOF: Suppose L # M. By Exercise 4.10.4, we can find neighborhoods U of L 
and V of M with UV = @. Since no sequence can be eventually in two disjoint 
sets (Exercise 9.2.7), it is impossible for a sequence to converge to both L and M. 
i! 


The proof of Theorem 9.6, with its reference to neighborhoods, is essentially 
topological. But Exercise 4.10.4, on which the proof is based, does not hold in 
every topological space, and neither does Theorem 9.6. You will see how 
dramatically it can fail in Exercise 9.6.6. We say a sequence (x,) is bounded if 


its range {x,,} is a bounded set. 


THEOREM 9.7: 4 convergent sequence is bounded. 


PROOF: Suppose (x,,) converges to L and let ¢ = 1. There is a natural number V 
so that tn € (L — 1,L + 1) whenever n > N. Let B = max{L + 1, x1, X, ..., Xp}- 
Then x, < B for all n (x, 1s in the set of which B is the maximum if n < N, and is 


less than L + 1 ifn > N). Likewise, x, > min{Z — 1, x1, X, ...,x,} for all n. = 


Observe that the converse of this theorem is not true (a bounded sequence need 
not converge). This proof illustrates an important technique of analysis. We 
break the problem carefully into cases (even though it might not seem at first to 
be a proof by cases), and obtain information about each case separately, 
sometimes in different ways. Here the cases are "1 < N" and '"n > N." The part of 
the sequence for which n > N is bounded because the sequence converges, while 
the part of the sequence for which n < N is bounded because it's a finite set. 


It seems that if "x, gets close to L," then the distance between x, and L must 


"get close to zero." This translates easily into the following, whose proof is 
Exercise 9.3.1. 


THEOREM 9.8: Let (x,,) be a sequence, L € R, and d,, = |x, — L|. Then lim x, = 
L ifand only iflim d,, = 0. = 


EXERCISES 9.3 


1. Prove Theorem 9.8. 


2. Show that lim x, = LZ if and only if the following holds: Given ¢ > 0 and any 
positive real number 5, there is an N € N so that |x, — L| < be whenever n > 
N. Keep a lookout for all the places in the chapter where this could save us 


some work. 
3. (a) Show that n* >n for alln €N. 
(b) Show in detail that lim = = 0. 
4. (a) Show that, if lim x, = 0 and (y,,) is bounded, then lim x,y, = 0 (whether 
(y,,) converges or not). 
(b) Does (a) remain true if lim x, exists but is not 0? 


(c) Is is possible to have x,y, converge even if y, 1s unbounded? 


5. (a) Show that if lim x,, = Z, then lim |x,| = |Z]. 


2 i 


(b) Show that the converse of (a) 1s not true. 


(c) Show that if lim |x 


(d) What property of 0 makes (c) true? 


| — 9, then lim x, = 0. 


6. Suppose X is a topological space that does not have the property described in 
Exercise 4.10.4. Show that there must be a sequence in X that converges to 
two different limits. (A topological space having the property of Exercise 
4.10.4 is called a Hausdorff space.) 


9.4 SEQUENCES AND ORDER 


The most important aspects of the interplay between sequences and the order 
structure of the real line are seen in the behavior of increasing and decreasing 
sequences. Our examination of this topic must wait until the next chapter, but we 
can say a few things now about sequences of positive numbers. Observe that the 
following theorem draws its conclusions about sequences from properties of 
their individual terms. In the next chapter we will be more concerned with 
sequences as whole objects. 


THEOREM 9.9: (a) Jf (x,,) converges and x, = 0 for all n, then lim x, = 0. 
(b) Jf (x,,) and (y,,) converge and x, <y, for all n, then lim x, <limy,. 


(c) If (x,) and (y,) both converge to L and x, < Z, < y, for all n, then (z,) 
converges and lim z, = L. 


PROOF: (a) Suppose a < 0. Then (—«, 0) is a neighborhood of a containing no 
element of {x,,}, and so (x,) can't converge to a. 

(b) Let im x, = Z and lim y, = M. Suppose L > M and let ¢ = (LZ — M)/2. Then 
(x,,) is eventually in the interval (L — «, L + €) and (y,) is eventually in (V/ — ¢, M 
+ ¢). But every element of (ZL — ¢, L + «) 1s larger than every element of (M — ¢, 
M + «), and so for n sufficiently large, we have x, > y,, a contradiction. 

(c) Let ¢ > 0 be given. There is a natural number N, such that x, > L — ¢ 
whenever n > N,. Likewise, there is a natural number N, such that y, < L + ¢€ 
whenever n > N>. Let N= max{N,, N>}, so that both inequalities hold for n > N. 
Then ifn > N, L-e <x, <z, <y, <Lte, and so lim z, =. = 


Part (c) of Theorem 9.9 is often called the Squeeze theorem. Its wording 
reminds us that whether a sequence converges and the value of its limit are 


separate pieces of information. 


EXAMPLES 9.4: 1. By letting us deal with upper and lower estimates instead 
of the sequence itself, the Squeeze theorem lets us strategically ignore parts of a 
function. For instance, since —1/n < (cos n)/n < 1/n, we can see that lim(cos n)/n 
= 0 without having to deal directly with the sequence (cos n). 


Combining Theorems 9.8 and 9.9, we obtain a very useful computational device: 


COROLLARY 9.10: /f |x, — L| <5, and lim 5, = 0, then lim x,, = L. 


PROOF: Left as Exercise 9.4.5 = 


EXERCISES 9.4 


1. 


Show that each inequality in the hypotheses of Theorem 9.9 need only hold 
eventually for the conclusions to be true. 


. (a) Show that Theorem 9.9.a can't be changed to "If (x,,) converges and x,, > 0 


for all n, then lim x, > 0." 
(b) What conclusion can be drawn if it is known that x,, > 0? 


(c) Give a condition that would guarantee that lim x,, > 0. 


. Draw a picture to illustrate the proof of Theorem 9.9.b. 


. (a) Show in two ways that, if x, = a for all n, and (x,) converges, then lim x,, 


>a: (1) Do a proof like that of Theorem 9.9.a; (2) Use the result of Theorem 
9.9.b. 

(b) Show that the assumption in Theorem 9.9.c that (x,,) and (y,) have the 
same limit is necessary. What, if anything can be said in this case if lim x, # 
lim y,,? 


. Prove Corollary 9.10. 


. (a) Recall that a rational number p/q is positive if p and g are either both 


natural numbers or both additive inverses of natural numbers. Show that the 
following defines a positive set on R: 


{rx €R:2 £0 and Arp) 3(rn € Q,rn > 0, and limr, = z)}. 


(b) Explain why it is acceptable to use the expression 7, > 0 in what purports 
to be a definition of a positive set. 


(c) Show that it is necessary to include the condition x # 0 in the definition in 
(a) [that is, show that the set is not a positive set if that condition is left out]. 


9.5 SEQUENCES AND ALGEBRA 


Limits of sequences respect the simple algebraic operations: 


THEOREM 9.11: Suppose lim x, = L and lim y, = M. Then 
(a) lim(@, + y,) =L + M. 

(b) lim(cx,,) = cL for any constant c. 

(c) lim(x,y,,) = LM. 

(d) lim(x,/y,,) = L/M provided y,, is never 0 and M #0. 


PROOF: We prove (a) and leave the rest as Exercise 9.5.1. Let ¢ > 0 be given. 
There is a number N, so that tn © (L—¢/2,L+e/2) when n > N, (The reason for 
using ¢/2 here will be apparent momentarily.) Likewise, there is a number N, so 
that Yn € (M—e/2,M +2/2) when n > N>. Let N = max{N,, N>}. If n > N, both 
these conditions hold, and we have L +M—-e=L-é&2+M-—é2<x,+y,<L 
+ e&2+M+ e2=L+M-+e,andsolim(i,+y,)=L+M.s 


Theorem 9.11 provides an easy proof of Theorem 9.9.b: If x, <y, for all n, then 
V, ~X, 20, and so lim(y,, — x,) = 0. According to Theorem 9.11, lim(y,, — x,) = 
lim y, — lim x,, and the result follows. We can't ignore the hypothesis of 
Theorem 9.11. We simply can't say that lim(x, + y,) = lim x, + lim y, unless we 
know (x,,) and (y,,) both converge. 


EXERCISES 9.5 


1. (a) Complete the proof of Theorem 9.11. 


(b) Is it necessary to state both of the added conditions in part (d) of Theorem 
9.11? 


2. (a) Give an example to show that it is possible to have lim(x, + y,) exist 
without having lim x, or lim y, exist. 


(b) Give an example to show that it is possible to have lim(x,,y,,) exist without 
having lim x, or lim y, exist. 

(c) If lim(@, + y,) exists and lim x, exists, must it be the case that lim y, 
exists? 


(c) If lim(x,y,,) exists and lim x, exists, must it be the case that lim y, exists? 


9.6 SEQUENCES AND TOPOLOGY 


The relationship between sequences and topology is found primarily in three 
topics: cluster points, closed sets, and continuous functions. We begin with a 
lemma that makes a simple but important observation. We say a point is a 
cluster point of a sequence if it is a cluster point of its range. 


LEMMA 9.12: 4 convergent sequence can have at most one cluster point, its 
limit. 
PROOF: Suppose (x,,) converges to L and that a # L. We will show that a is not 


a cluster point of {x,,}. Let ¢ = |a — L|/2. Then the intervals (LZ — ¢, L + €) and (a 


— €, a + €) are disjoint. Now there can be only finitely many values of 1 for 
which “n ¢ (L—¢,£ +). This means that {@n} 0(@~ ©.@ + ©) is finite, and a is not 
a cluster point of (x,,). = 


Lemma 9.12 doesn't say that a cluster point of a sequence is necessarily its limit, 
or even that the limit of a convergent sequence is necessarily a cluster point. The 
latter is, however, one of only two possibilities: 


THEOREM 9.13: /flim x, = L, one of the following is true: 
(i) (x,,) is eventually equal to L 
or (il) L is the only cluster point of (x,,). 


PROOF: This statement is of the form A = (B or C). This is equivalent to (A 
and not B) = C. Suppose that lim x, = Z and that (x,) is not eventually equal to 


L. Let ¢ > 0 be given. Since (x,,) is not eventually equal to L, there are infinitely 


many values of n for which x, # L. But since (x,,) converges to L, it is outside the 


interval (L — ¢, L + e) for only finitely many values of n. Consequently, there 
must be a value of for which x,, is different from L but x, is in the interval (L — 


é, L + €). Thus L is a cluster point of (x,,). By Lemma 9.12, L is the only cluster 
point of (x,,). = 


The converse of Theorem 9.13 is only partially true. It is easy to see that (x,) 
converges to L if it is eventually equal to Z (and notice that such a sequence has 
no cluster points). On the other hand, a sequence might have only one cluster 
point yet still not converge: Let x, = 1 ifm is odd and 1/n if n is even. Then 0 is 
the only cluster point of (x,,), but (x,,) doesn't converge. We will see shortly that 
our inability to obtain a definitive statement along these lines is a result not so 
much of the intractability of the problem, but of our not looking at it in quite the 
right way. 


LEMMA 9.14: The point c is a cluster point of a set S if and only if there is a 
sequence of elements of S, all different from c and converging to c. 


PROOF: The "if" part follows from Exercise 7.5.2 and Theorem 9.13. Now 
suppose c is a cluster point of S. For each n there is an element of S, say x,, so 
that fn © (¢ ~ 1/n,e + 1/n)\{e}, By Corollary 9.10, the sequence (x,,) does what 
we want. = 


This proof of Lemma 9.14 uses the Archimedean property. On a deeper level, the 
result relies on the fact that the topology of the real number line is first countable 
(see Exercise 8.4.6). There are topological spaces in which neither the 
Archimedean property nor first countability hold, and Lemma 9.14 also is not 
true in every topological space. The next two theorems show us the real 
substance of the relationship between sequences and topology. 


THEOREM 9.15: A subset S of R is closed if and only if &x © 5 whenever (x,) is 
a convergent sequence whose terms are all in S. 


PROOF: We will do half of the proof by the forward-backward method. Here 
there is more "forward" than "backward." Remember that this is a universally 
quantified statement, which explains the quick insertion of the second step 
below. 


Let S be closed. 
~—» Let (x,,) be a convergent sequence whose terms are in S. 


* * * 


limz, € S. 


We have limited information about convergent sequences, and so we can quickly 
review it all. In view of the definition of "closed," Theorem 9.13 looks useful: 


Let S be closed. 
Let (x,,) be a convergent sequence whose terms are in S. 


——» lim x,, either is a cluster point of {x,} or is an element of {x,}. 


* * * 


limznyn € S. 


Our working hypothesis is now a statement of the form A or B, and so we should 
do a proof by cases. Remember: The cases must exhaust all possibilities, and 
each case must lead to the desired conclusion. Theorem 9.13 guarantees that we 
have exhausted all possibilities in the following: 


Let S be closed. 
Let (x,,) be a convergent sequence whose terms are in S. 


lim x,, either is a cluster point of {x,} or is an element of {x,}. 


—— CASE 1: —+ CASE 2: 
+limz, is a cluster point of {z, }. —+ lima, € {xy}. 
* * * * * * 
limz, € S. limz, € S. 


We observe that {#n} © S and use Exercise 7.5.2 for the first case. The second 
case iS even easier: 


Let S be closed. 
Let (x,,) be a convergent sequence in S. 


lim x,, either is a cluster point of {x,} or is an element of {x,}. 


Case 1: CASE 2: 
lim x, is a cluster point of {x,}. lima, € {tp}. 
— {ry} C S. —> {in} CS. 
— + limz,, is a cluster point of S. lima, € S. 
limzr, € S. 


Now suppose c is a cluster point of S. By Lemma 9.14, there is a sequence of 
elements of S converging to c. By hypothesis, the limit of any convergent 
sequence in Sis in S, and soc € S and S is closed. = 


We could define closed sets by specifying which sequences converge, then 
define open sets using Theorem 8.10. In this sense, the collection of convergent 
sequences on the real number line carries the same information as its topology 
(though this is not the case for all topological spaces). 


THEOREM 9.16: 4 function f : R — R is continuous if and only if, for each 
convergent sequence (x,,), lim f(x,,) = (im x,,). 


PROOF: Suppose f is continuous and let L = lim x, and e > 0 be given. The 
interval J = ({(L) — ¢, AL) + €) is an open set containing f(L). Thus U = f!(/) is an 
open set containing L, and so U is a neighborhood of L. Since (x,,) converges to 
L, it is eventually in U, and so (f(x,,)) 1s eventually in J, that is, (/(x,,)) converges 


to f(L). In the other direction, we use an ¢-d argument since the definition of "not 
continuous" is more easily stated in those terms. If fis not continuous, there is an 
L <= R so that fis not continuous(e) at L. Then it is not the case that 


Ve > 036 > 035V2(|x — Ll < 6 > |f(x) — f(L)| < €), 
in other words, 
de > 05V6 > O3r3(|z — L| < 6 and | f(x) — f(L)| > «€). 


Let ¢ > 0 be provided by this statement, and for each natural number n, let x,, be 
a number with |x, — L| < I/n and |fx,,) — fL)| = e. Then (x,,) converges to L, but 
(f(x,,)) doesn't converge to f(Z). = 


EXERCISES 9.6 


1. Draw a picture to illustrate the proof of Lemma 9.12. 


2. (a) Complete the proof of Lemma 9.14 and explain how the Archimedean 
property is used. 
(b) Construct a proof of Lemma 9.14 based on the first countability of the 
topology of the real line. 


3. Use Theorem 9.15 to show that the union and intersection of two closed sets 
are closed. 


4. Suppose S is a nonempty open set that isn't the whole real line. Show that 
there is a sequence of elements of S that converges to an element of C(S). 


5. Adjust Theorem 9.16 to account for functions whose domains are not the 
whole real line. Prove your new theorem. 


6. (a) Show that a sequence that is eventually constant converges. 


(b) Show that if the topological space X has the discrete topology, the only 
sequences that converge are those that are eventually constant. 


(c) Show that if the topological space X has the indiscrete topology, every 
sequence converges to every element of X (!) 


9.7 SUBSEQUENCES 


Suppose we've found that a sequence diverges. Can we say anything more about 
it? The sequence ((—1)”) diverges, but it has a "subsequence" (we will give a 
precise definition of this in a moment) that goes 1, 1, 1, ..., and certainly 
converges. The sequence (7) also diverges, but any sequence we might construct 
by selecting terms from (n) will also diverge. Both ((—1)”) and (n) diverge, but 
their subsequences behave differently. 


DEFINITION 9.17: (a) A function  : N — N is said to be strictly increasing if 
n(k + 1) > n(k) for allk EN. 


(b) Ifx : N— R is a sequence and 1 : N — N is strictly increasing, thenx.n:N 
— R is called a subsequence of x. 


The notation of Definition 9.17 can be confusing, and we won't use it much. 
Instead of writing n(k) for the values of the function 1, we will write n; (note that 


the function n is, in effect, a sequence of natural numbers). The subsequence 


(x(n(k))) is then written (£».). Observe that the index of this subsequence is k, not 
n, orn. Since nz, > nx for all k, a subsequence of (x,,) consists of infinitely many 


terms of (x,,) kept in the same order. It will be useful to note that, if 1 is as in the 
definition, then n, = k for all k. 


EXAMPLES 9.7: 1. In the example of x, = (—1)”", we may take n, = 2k to get 
the subsequence 1, 1, ... and nm, = 2k+1 to get —1, —1, .... Note that a divergent 
sequence can have convergent subsequences. 

2. Let x, = 1/n and n, = k°. Then *». = !/4*, It appears that this converges to 0, 
as does the original sequence. 


Is there a relationship between convergence of a sequence and convergence of its 
subsequences? We saw in Example | that convergence of some subsequences 
doesn't guarantee the convergence of the sequence, but... 


THEOREM 9.18: /f the sequence (x,,) converges, so does every subsequence of 
it, and all converge to the same limit. 


PROOF: Suppose that (x,,) converges to L and \“n.) is a subsequence. Let V be a 


neighborhood of L. Let N be such that *» © V whenever n > N. If k > N, we have 
n>k>N,andson = & > N, and (nx) converges to L. = 


COROLLARY 9.19: Jf a sequence has subsequences that converge to two 
different limits, the sequence diverges. = 


This provides a much quicker proof that ((—1)”) diverges (it has subsequences 
converging to both 1 and —1). We will see in the next chapter that even partial 
converses to Theorem 9.18 are difficult to come by. 


EXERCISES 9.7 


1. Show that if Y is a subsequence of X and Z is a subsequence of Y, then Z is a 
subsequence of X. 


2. (a) Ifg: N > N is strictly increasing, show that g(k) > k. 


(b) A function g : R — R is strictly increasing if x > y > g(x) > g(yv). Show 
that even if g : R — Rs strictly increasing, it is not necessarily the case that 


g(x) > x for all x. 
. Discuss the behavior of the sequence (cos n) and its subsequences. 


. Use Theorem 9.16 and Corollary 9.19 to show that f(x) = sin(1/x) can't be 
defined at 0 in such a way to make it continuous. 


. (a) Show that it is possible for a sequence to have no convergent 
subsequences at all. 

(b) Show that it is possible to have a sequence (x,) diverge while the 
sequence (|x,|) converges. 


(c) Is it possible to have a sequence (x,,) with no convergent subsequences, 
yet have (|x,,|) converge? 


| We will prove the Intermediate Value theorem in Chapter 12. 


Chapter 10 


Sequences and the Big Theorem 


' 


Bolzano- 
Weierstrass 
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10.1 CONVERGENCE WITHOUT LIMITS 


We know that lim 1/n = 0 and that the sequence we found at the beginning of the 
previous chapter converges. Our knowledge of the latter sequence, however, is 
quite different from our knowledge of the former. Though we can estimate the 
difference between the terms of the latter sequence and its limit, we don't know 
exactly what that limit is (though we certainly have strong suspicions). Our 
definitions of convergence are designed to allow us to check whether a guess 
really is the limit of a sequence, but give us no hint how to find those guesses in 
the first place. In this chapter we develop methods by which we can sometimes 
decide whether a sequence converges without ever knowing its limit. It is quite 
remarkable that we can even hope to do this. These results are among the 
deepest we see in this book, and have the most important ramifications in 
applied mathematics. 


As we would expect, the first of these theorems applies only to the simplest 
sorts of sequences. Here we will also see the deeper aspects of the relationship 
between convergence of sequences and the order structure of the real numbers. 


10.2 MONOTONE SEQUENCES 


DEFINITION 10.1: A sequence (x,) is increasing if x,.; =x, for each n, and 


decreasing if x,,, < x, for each n. A sequence is monotone if it is either 
increasing or decreasing. 


Note the inequality symbols carefully. The sequence given by 1, 1, ..., is both 
increasing and decreasing. If it happens that x,,, > x, for each n, we say (x,,) 1s 


strictly increasing, and similarly for the other terms. 


EXAMPLES 10.2: 1. The sequence (1/n) is strictly decreasing since 1/(n + 1) < 
1/n for all n. The sequence (1 — 1/n) is strictly increasing. 


2. The sequence 0, 1, 1/2, 1/3, ... is not monotone, even though it's not very 
different from (1/n). It is not increasing because 1/2 = x3; < x, = 1 and is not 


decreasing because | = x, > x, = 0. It is, however, eventually decreasing. 


3. The sequence ((—1)”) is not monotone and not eventually monotone. If 7 is 
odd, -1 = (-1)” < (-1)"*! = 1 and 1 = (-1)"*! > 1)" =—1. Since we can find 
odd numbers as large as we want, the sequence is not eventually monotone. 


4. The sequence constructed at the beginning of Chapter 9 to estimate V2 is 
increasing (but not strictly increasing). The process described there can never 
result in a guess that is to the left of the previous one. 


The following is the first step toward our goal of deciding whether a sequence 
converges without having to know its limit. It is sometimes called the Monotone 
Convergence Theorem, but BEWARE, there is another theorem by this name 
down the road in your mathematical career (it is about a different subject and 
much deeper). We pick out a section of the proof as a lemma. It was used in 
Theorem 7.6, but its proof was put off because we now have better terminology. 
You should check to see that there is no circular reasoning here, that is, that there 
is no use made of Theorem 7.6 in this proof. This is always a risk when the proof 
of a result is so far removed from the place it is used. 


LEMMA 10.2: (a) Any cluster point of an increasing sequence is an upper 
bound for the sequence. 


(b) Any cluster point of a decreasing sequence is a lower bound for the 
sequence. 


PROOF: We will prove (a). Call the sequence (x,,) and suppose that * © “no for 


some ny. Notice that (~°;£no) is a neighborhood of x but {2n}(—09, fno) is 
finite (it has at most ny — 1 elements). Thus x is not a cluster point of {x,}. Note 
in particular that no element of {x,,} is a cluster point of {x,}. = 


THEOREM 10.3: A bounded, monotone sequence converges. 


PROOF: Let (x,,) be a bounded, monotone sequence. We may suppose that (x,,) 
is increasing. Its range is either finite or infinite. If its range is finite, (x,) 1s 


eventually constant (you will verify this in Exercise 10.2.6), and so it converges. 
If {x,,} 1s infinite, it has a cluster point by the Bolzano-Weierstrass theorem. Let 


u be a cluster point of {x,,} and let ¢ > 0 be given. By Lemma 10.2, u > x, for all 
n. By the definition of cluster point, there is an element of {x,}, say x,y, in the 
interval (u — e, u + e). Then for all n > N, we have xy < x, < u. Then 
Tn © (U—€,U+€) for all such n and (x,,) converges to u. = 


By Theorem 5.3, u = sup {x,,}, allowing us to sharpen this result a bit: 


COROLLARY 10.4: /f (x,,) is increasing and bounded, then lim x, = sup {x,,}. If 
(x,,) is decreasing and bounded, then lim x, = inf{x,}. 


PROOF: Left as Exercise 10.2.5. This is a corollary as much to the proof of 
Theorem 10.3 and to Theorem 5.3 as to Theorem 10.3 itself. = 


EXERCISES 10.2 


1. Which of these is the definition of "monotone"? What does the other say 
about the sequence (x,,)? 


(i) Yr(@n 2 Tn+1 OF Tp S Ln41) 
(11) ¥n(2@y > Tn+1) or Vn(rp < Tp+1) 


2. Complete the proof of Lemma 10.2 and verify the comment at the end of its 
partial proof. 


= _ 1 1 So l ; Pay a Sia : 
3. Let Se =1+a+5+°' +52, Show that 8x <2-, forn = 2,3,.... What does this 
say about lim s,,? 


, 100" 4 
. Show that the sequence ‘»') is eventually monotone. Is it eventually 


increasing or decreasing? Find the value of N after which it is monotone. 
Does it converge? 


. Prove Corollary 10.4. 


. (a) Show that a sequence that is eventually constant has a finite range. 


(b) Show that the converse of this is not true (a sequence with a finite range 
is not necessarily eventually constant). 


(c) Show that a monotone sequence with a finite range is eventually constant. 


. If Sis a bounded set, show that there is an increasing sequence of elements of 
S converging to sup S and a decreasing sequence of elements of S converging 
to inf S 


. Suppose fis an increasing, bounded function whose domain contains some 
ray [a, 00). Show that lim... f(x) exists. 


9 Define H and G by ™ ak en raid =Lt gto te 
(a) Show that H and G are both increasing. 
(b) Show that H diverges and G converges. 
(c) What is lim g,,? 


(d) Suppose (a,,) 1s a sequence of positive numbers and let S' = (s,,) be defined 
by s, =a, +a,+...+a,. Show that S converges if and only if it is bounded. 


(e) Suppose (a,,) and (b,) are sequences of positive numbers, with a, < b, for 
all n. Let S and T be constructed from (a,) and (5,), respectively, as in (d). 
Show: 

(1) If T converges then S converges 

(11) If S diverges then T diverges 

and (i111) It is possible for S to converge and T to diverge. 

(f) (Adventure!) Show that lim(i, — In(7)) exists. [Hints: Think about 
integrals, Riemann sums, and rectangles. You will need to know that the 
sequence defined as in (d) with a, = 1/n? converges.] This limit is called 


Euler's Constant. It is denoted y, and its value is approximately 0.577. No 
one knows whether » is rational or irrational! 


10.3 A RECURSIVELY DEFINED SEQUENCE 


Define a sequence like this: Let x, = 1 and ®n+1 = v3 +n for n= 1. This is called 


a recursively defined sequence. Recursive procedures are useful in a variety of 
settings. They can often be used to approximate a solution where an exact one is 
difficult or impossible to find. They also lend themselves nicely to proofs by 
induction. We will show that (x,) converges. Its first few terms are 


1,2, V5 = 2.236, V3 + V5 & 2.288,.... The sequence seems to be increasing. We can 


prove this by induction: 
P(n) Xn+1 7 Xp 
PU) tH 2 Hx 


Assume P(k) : xp.) > x4 
Want P(k + 1): Xp49 > Xp4y 


Theo = /34+Eer1 


= Lk+1- 
(since V“ is an increasing function, and using the induction hypothesis) 


A little more work with the calculator suggests that the sequence grows very 
slowly. For instance, x19 1s only about 2.3. (A calculator is often a good place to 


look for ideas about sequences, but one must be sure to prove that those guesses 
are correct.) To be cautious, we might guess that all the terms are less than 3. 
This, too, can be proved by induction. 


PO)i x, <3 

PO aya 3 
Assume P(k) : x, <3 
Want P(A + 1): x44; <3 


Teei = V3+ 2 
< /3 +a 
— /6 
< 3. (induction hypothesis; Vv" is increasing) 
Thus (x,) 1s bounded and increasing, and so it converges. What is its limit? 
Knowing that a sequence has a limit can be a big step toward finding it. Call the 
limit L. If we make n very large, x, is very close to L (we gloss over some details 


here). Since ! < L < 3, V is continuous at 3 + L, and so if x, is very close to 


L, in+1 = V3 +n is very close to V3 +L. But x,,; is also very close to L, and so 
L® angi = V3 Fin = L, tn41=V3+2n (and & can be made as close to = as we 
like). The limit of this sequence is between 1 and 3 and is a solution to 
L = V34L. This means L = (1 + V13)/2 & 2.303. 


Sequences like the one in Example 10.2.2 can make application of the Monotone 
Convergence theorem difficult. It seems, though, that the behavior of a sequence 
near the beginning should not affect its convergence or divergence. Adjusting the 
proof of Theorem 10.3, we may show: 


THEOREM 10.5: A sequence that is bounded and eventually monotone 
converges. = 


This is not the whole story since not every convergent sequence is eventually 


monotone. The sequence ((—1)’/n) converges to 0 (Exercise 9.3.5) but is not 
eventually monotone. 


EXERCISES 10.3 
1. Prove Theorem 10.5. 
2. Say how we know, in the example in this section, that 1 < LZ <3. 


3. Suppose that F': R — R, that x, =), and that x,,, = F(x,) and y,.; = FO). 
Show that x, = y, for all n. 


4. Prove that the behavior of a sequence near the beginning doesn't affect its 
convergence or divergence in the following way. For a sequence X = (x,,), let 


X(m) be the sequence given by x(m), = X,,4,. Show that XY converges if and 


only if X(m) converges for every m¢N. (The hardest part of this is 
understanding the notation!) 


. Letx; =a>0Oand“"*! ~ *»~ =,, Show that (x,,) diverges. 


. Suppose a > 0 and z, > 0. Let a sequence be defined recursively by z,,, = (a 


+ z,)'? for n > 1. Discuss the convergence of this sequence for various 
values of a. 


. Let the sequence (s,,) be given by #1 = V2 and 8n+1 = V2 + Vn forn=1, 2, .... 
Show that (s,,) converges and find its limit. 


. Newton's Method for finding square roots is described by 


l a\ . 
™=1; Inti= 5 (2, an ) forn > 1 
«-\ In, 


(a) If a > 0, show that (x,,) is a bounded, eventually monotone sequence (the 
only term that might not be "in order" is the first one). 


(b) Show that limen = Va, 


. The general Newton's method is a recursive procedure for approximating the 
solution to an equation f(x) = 0. The means of obtaining a guess x,,, from the 


previous guess x, 1s described in the following picture: 


Pl Tangent line 


The point x,,, 1s the intersection of the tangent line to y = f(x) at the point 
(x,,.(x,,)) with the x-axis. 


= In — 


(a) Beginning with this diagram, show that “"*! 


(b) Note that V@ is a solution to x7 — a = 0. Verify the formula in Exercise 
10.3.8. 


(c) Find Newton's method for cube roots. 

(d) If (1) f(x) is strictly monotone and concave up on an interval that contains 
both the solution and the initial guess x,, (11) f’ > 0 at the solution, and (111) 
the initial guess is greater than the solution (as in the picture), show that (x,,) 
is decreasing and bounded. 

(e) If the conditions in (d) hold, show that lim x, is the solution to the 
equation f(x) = 0. 

(f) The formula for Newton's method suggests problems if /(x,,) = 0 for any 
n. Why? Draw some sketches to see what might happen if /’ is close to or 
equal to 0 anywhere between the initial guess and the solution. 

(g) When Newton's method works, the convergence of the guesses to the 
solution can be quite rapid. Using x; = 2 as the initial guess, apply Newton's 
method for square roots 5 or 6 times, and observe the way the error changes 
(you'll be doing this on a calculator, which can also give you the value of v 2 
). Is there a pattern in the way the errors decrease? Subtract each error from 
the previous one. Divide each error by the previous one. Count decimal 


places of accuracy. Make a conjecture. Does the same pattern hold for the 
cube root method you found in (c)? 


(h) Apply Newton's method (with a calculator) to a problem for which you 
don't know the answer. (For instance, use it to find the solution to cos(x) = x.) 
Can you relate the pattern of the guesses to what you discovered in (g)? 


10. Define the sequence (x,,) by x, = 1 and “"*! = forn>1. The procedure 
described in the section suggests that lim, = 3. Is this true? 


11. The point a is called a fixed point of the function fif f(a) = a. Ifa 
convergent sequence is defined by x, = a and x,,, = f(x,), and if f is 


continuous at the limit of the sequence, show that the limit is a fixed point of 


I 


12. Describe the expression x” in terms of a recursive sequence—recall that 
a” =a), Are there any values of x for which this sequence has a limit? 
What is that limit? 


a ; 

13. a) Consider the sequence given by “! © .fn+l = Ie, Show that this 
sequence converges (consider the subsequences consisting of every other 
term). What is the limit of this sequence? 


(b) If we begin to write these terms out, they look like 


1 


bo 
| 


7 l 
2+ — 


1 


aa 


Such a construction is called a continued fraction. We may represent a 
continued fraction by specifying the numbers along the lower left diagonal. 
For instance, the one above would be represented as [2, 2, 2, ...], while [1, 
2,3, ...] would give us 


oo — 


Even though the limit of the sequence in (a) is irrational (and so has a 
nonrepeating decimal expansion), it does have a repeating continued 
fraction representation. Find [1, 1, 1, ...]. The study of continued fractions 
has given rise to a number of advances in mathematics. Stieltjes was 
studying continued fractions when he conceived of the Riemann-Stieltjes 
integral (see Chapter 17), though the connection is tough to pick out. 

(c) (Research) Suppose we take a sequence (a,) and construct from it a 
continued fraction [a,, a>, ...]. Is there a relationship between the 
convergence of the sequence and the convergence of the continued fraction? 


10.4 THE BOLZANO-WEIERSTRASS THEOREM (REVISITED) 


We modify Lemma 9.14 to obtain the following: 


THEOREM 10.6: /f c is a cluster point of {x,}, there is a subsequence \&nx) 
converging to c. 


PROOF: This seems very similar to Lemma 9.14, but if we try the same proof, 
the elements we find don't necessarily form a subsequence of (x,,) (they may not 


be in the proper order). We modify that proof, being careful to keep the elements 
we select in the proper order. Since (£n}(¢~— 1/k,e + 1/*) is infinite for each k, 
so is the set 


S, = {n: 2, € (c — 1/k,c + 1/k)}. 


Let *n, © le~letl) Since S, is infinite, there is an ny > n, with 


tna €(e— 1/2,¢+ 1/2) Similarly, since S3 is infinite, there is an n3 > ny with 


tna © (©~1/3,.¢+ 1/3) Continuing in this way, we find 2m >Tn2>Tns>-+ >, with ng > 


n,and &n, © (¢—1/k,e+ 1/k) for each k. It follows that im@n, =¢, » 


In this proof it is more important that the sets S;, are infinite than it is that c is a 
cluster point of {x,,}. This suggests a change in our point of view: 


DEFINITION 10.7: A point c is a sequential cluster point of (x,) if 
{nian € (c~€,¢+)} ig infinite for each e > 0. 


Note well that c is a sequential cluster point of (x,,) if the set of subscripts for 


which “n © (¢— €,¢+®) is infinite for every ¢. Every cluster point of a sequence 
is a sequential cluster point, but the converse is not true. (This is another way 
mathematics gets done. If we observe that a proof passes through an important 
intermediate stage, we make a definition to encompass anything satisfying the 
condition of that stage.) 


THEOREM 10.8: c is a sequential cluster point of (x,) if and only if (x,) has a 
subsequence converging to c. 


PROOF: Left as Exercise 10.4.2. (We may consider this an alternative 
definition of "sequential cluster point.") = 


The following, sometimes called the Bolzano-Weierstrass Theorem for 
Sequences, reassures us that we've found an appropriate analogue for cluster 


points. 


THEOREM 10.9: Every bounded sequence has a sequential cluster point. 


PROOF: Let (x,,) be a bounded sequence. Then {x,} is either finite or infinite. If 


it is finite, at least one element of it must be repeated for infinitely many values 
of n. This repeated element, with its subscripts kept in order, is a convergent 
subsequence. If the range is infinite, the Bolzano-Weierstrass theorem (for sets) 
guarantees that the range has an ordinary cluster point. Such a point is also a 
sequential cluster point by Theorems 10.6 and 10.8. = 


Combining the last two theorems, we obtain the following, also sometimes 
called the Bolzano-Weierstrass theorem for sequences. 


COROLLARY 10.10: Every bounded sequence has a convergent subsequence. 
i | 


EXERCISES 10.4 


1. 


Bs 


Prove the comment preceding Theorem 10.8. 


Prove Theorem 10.8. 


. Give an example of a sequence (x,,) having both 1 and 2 as sequential cluster 


points but such that {x,} has only 1 as a cluster point. 


. Find all sequential cluster points of the sequence (sin(7)). (Very difficult!) 


. Show that a bounded sequence diverges if and only if it has (at least) two 


sequential cluster points. 


. (a) Show that every sequence has a monotone subsequence. 


(b) Use this to prove the Bolzano-Weierstrass theorem for sequences. 


. Prove directly that a monotone sequence can have only one sequential cluster 


point. (In this case, "directly" means that you shouldn't refer to Theorem 
10.5.) 


. Verify the following statement from the proof of Theorem 10.9: "If [the 


range] is finite, at least one element of it must be repeated for infinitely many 


values of n." 


9. Show that a sequence with an ordinary cluster point can't be eventually 
constant. 


10. (a) The limit superior of a sequence (a,), denoted lim sup a,, is the 
supremum of its set of sequential cluster points [if (a,,) is not bounded above, 
lim sup a, = ©]. Show that lim sup a, = lim,_,,, SUP js, {a} (hence the 
name). 

(b) The limit superior of a set was defined in Exercise 7.6.6. Show that lim 
sup a,, is not necessarily the same as lim sup ‘(n° " © Nj. 

(c) State a definition and repeat (a) for the limit inferior of a sequence. 

(d) Show that lim a, = L if and only if lim inf a, = lim sup a, = L. 

(ec) Show that in general, lim sup(a, +b,,) # lim sup a, + lim sup b,. Is there 
any predictable relationship between the sides of this expression? 

(f) Show that a bounded sequence (a,,) has a subsequence that converges to 
lim sup a,. (This provides another proof of the Bolzano-Weierstrass theorem 
for sequences, because every bounded sequence has a lim sup.) 

(g) Suppose (a,,) is a bounded sequence with a, = a for all n and lim sup a, = 
a. Show that lim a, = a. 


(h) Show that it is possible to have lim sup a, = inf a, and it is possible to 
have lim inf a, = sup a,. Is it possible for both of these to happen for the 
same sequence? 


11. (a) Let), 75, ..., be an enumeration of the rational numbers in the interval 
[0, 1]. Show that every element of [0, 1] 1s a sequential cluster point of (7,,). 


(b) Is there a sequence whose set of sequential cluster points is (0, 1)? 


12. Show that if a sequence has a bounded subsequence, it also has a 
convergent subsequence. 


13. If X and Y are sequences, define their "weave" XWY to be the sequence 
given by x), Vj}, X97, >, .-., and denote the set of sequential cluster points of X 


by X”. Show that (X@Y)! = X'UY", 


10.5 THE CONVERSE OF THEOREM 9.18? 


The converse of Theorem 9.18 would say "If every subsequence of (x,) 
converges to L, so does (x,,)." This is such a weak statement that it says nothing 


of interest. Useful partial converses of Theorem 9.18 are hard to come by. The 
Bolzano-Weierstrass theorem for sequences gives us the following. Sometimes a 
slight weakening of a useless statement can make it into a useful one. 


THEOREM 10.11: /f (x,) is a bounded sequence with the property that every 
convergent subsequence converges to the same number L, then (x,) converges to 
L. 


PROOF: Since (x,,) is bounded, there is an interval [—B, B] that contains {x,}. 
Suppose that (x,,) doesn't converge to L. Then there is an e > 0 so that (x,,) 1s not 
eventually in V = (L — e, L + e). Then there is a subsequence ‘“n.) with 
ny, © [-B,B\\V for all k. Now ‘“"*) is bounded and so has a sequential cluster 
point, say c. Since {%n.} © |~8.8)\V and [-B, B]\V is closed, it must be that 
c € |-B, B\\V. In particular, c # L. Any sequential cluster point of ‘""«’ is also a 
sequential cluster point of (x,,), and so c = L, a contradiction. = 


EXERCISES 10.5 
1. Why is Theorem 10.11 a weakening of the converse of Theorem 9.18? 
2. Does Theorem 10.11 hold if the sequence isn't assumed to be bounded? 


3. Suppose (x,,) is a bounded sequence such that every subsequence of (x,,) has a 
subsequence that converges to L. Show that (x,,) converges to L. 


10.6 CAUCHY SEQUENCES 


While examining the sequence ((—1)”), we noticed that if the terms in a sequence 
get close to a limit, they must get close to each other. In one of life's ironic 
twists, the deepest of ideas springs from this simple observation. 


DEFINITION 10.12: The sequence (x,,) is a Cauchy sequence if, for any ¢ > 0, 


there is a natural number N so that |x,, — x, |< whenever m > N andn > N. 


The definition of a Cauchy sequence is a precise statement of the phrase "the 
terms in the sequence get close to each other." Unfortunately, this concept does 
not lend itself easily to topological interpretation. 

Because |a — b| = |b — al, it doesn't matter which of m or n is larger, and so 
we may say "|x, — x,| <@ whenever m =n > N," if it is convenient. 


EXAMPLES 10.6: 1. (a) is a Cauchy sequence. Let ¢ > 0 be given, and let N be 
such that 1/N < ¢«. If m>n > N, then 


|Seig Thy | = 


l 1 
— Ra eal + 
2" and let m >n > N. Then 


Nir 
cs 


eee 


which can be made as small as we like (you should check all of these 
statements). 


3. (In(v)) 1s not a Cauchy sequence. Let m > n. Examine In(m) — In(n). By the 
Mean Value theorem,! this is equal to (m — n)(1/c) for some c between m and n. 
Then 1/c > 1/m, and so (m — n)(1/c) > (m — n)/m = 1 — n/m. This approaches 1 
for any n. Thus In(m) — In(v) can't be made close to 0 in the sense of Cauchy 
sequences. Notice, however, that In(7) — In(n — 1) does go to 0 as n — o. The 
latter condition (x, — x,-; — 0) 1s not sufficient to guarantee that a sequence is a 


Cauchy sequence. 


4. The sequence constructed in the example at the beginning of Chapter 9 is a 
Cauchy sequence. If we let J, be the current interval at the nth stage of that 


process, observe that *« © /n whenever k > n. By Theorem 4.19, |x, — x;| < (the 
length of J) = 1/2” if 7, k > n. Since lim 1/2” = 0, (x,,) is a Cauchy sequence. This 
example shows that approximation procedures can sometimes be shown to 


produce Cauchy sequences even when the exact solution to a problem is not 
known. 


Earlier, we observed that "if the terms in a sequence get close to a limit, they 
must get close to each other," and this led us to the idea of Cauchy sequences. 
We may now prove this observation. 


THEOREM 10.13: A convergent sequence is a Cauchy sequence. 


PROOF: Let (x,,) converge to L and let ¢ > 0 be given. Let N be such that |x, — 
L| < e/2 whenever n > N. If m, n > N, then 


|2mn — In 
= |t—-L+L—2, 
< Son = L| + IL _ Tn 


j= 


Theorem 10.13 is not very deep, but its converse is very much so. We will 
approach this through two lemmas describing properties of Cauchy sequences. 
Keep in mind that this is an "after the fact" organization of the proof, and the 
usefulness of each lemma may not be immediately apparent. 


LEMMA 10.14: Jf any subsequence of a Cauchy sequence converges, the 
sequence does also (and, by Theorem 9.18, to the same limit). 


PROOF: This proof is as easy to see in words as in symbols: The terms in a 
Cauchy sequence get closer and closer together, and if it has a convergent 
subsequence, some of its terms get close to a limit, and so all its terms must get 
close to that limit. All we need to do 1s translate this into a precise argument: Let 
\7n,) be a subsequence of (x,,) converging to L. Let ¢ > 0 be given and let N, 0 be 


such that |x, — x,| < «/2 whenever m => n > N,. Let N, be such that 
[Zm — In| < €/2 whenever k > N, and let N= max{Nj, N5}. If k > N, then 


So lim x,, = L. The estimate on |? ~&xx| holds because n, >k > N. = 


We need to know whether a Cauchy sequence necessarily has a convergent 
subsequence. This is a consequence of the following, whose proof should have a 
familiar ring to it. Compare this to our earlier proof that a convergent sequence is 
bounded. 


LEMMA 10.15: 4 Cauchy sequence is bounded. 


PROOF: Let N be such that |x,, — x,| < 1 whenever m, n > N. 


Then we have, in particular, that [xj\,, — x,| < 1 whenever n > N. Then {x,} is 
bounded above by max {xq + 1, x1, ..., xy} and below by min{x,,, — 1, x), ..., 
Xv}. | 


The pieces are now in place, and we can establish the next part of the Big 
Theorem. Note that the Archimedean property (as stated in part (c) of the Big 
Theorem) implies itself (as stated in part (e) of the Big Theorem). 


THEOREM 10.16: Jf F is an ordered field in which the Bolzano-Weierstrass 
theorem holds, then a Cauchy sequence of elements of F converges to an element 
of F. 


PROOF: By Lemma 10.15, a Cauchy sequence is bounded. By the Bolzano- 
Weierstrass theorem for sequences, it has a convergent subsequence whose limit 
is an element of F. By Lemma 10.14, this is the limit of the sequence. = 


EXAMPLES 10.6: 1. Consider the decimal expansion 0.d,d5d; ..., and let x, = 
0.d\d,... d,. lfm > n, then x, —x, = 0.0... Od...) ... d,, < 1/10”. Thus (x,,) is a 


Cauchy sequence, and so it converges. This is another proof that every decimal 
expansion corresponds to a real number. 


EXERCISES 10.6 


1. Show directly that CT) isa Cauchy sequence. (This time "directly" means 
do not argue that since this sequence converges it must be a Cauchy 
sequence. ) 


2. (a) If (a,) and (b,) are Cauchy sequences, show directly that (a,+b,) and 
(a,,b,,) are Cauchy sequences. 
(b) If (a,) and (,) are Cauchy sequences, is (a,/b,,) necessarily a Cauchy 
sequence? 
(c) If (a,) and (6,,) are Cauchy sequences, give conditions on (a,,) and/or (6,) 
that would guarantee that (a,,/b,,) is a Cauchy sequence. 


3. (a) Show that a Cauchy sequence of integers must be eventually constant. 


(b) Complete and prove: If S is a set with the property _._______, then any 


Cauchy sequence of elements of S is eventually constant. 


4. (a) Show that a Cauchy sequence can't have more than one sequential cluster 
point. 


(b) Use Theorem 10.11 to prove Theorem 10.16. 


5. (a) Suppose that (x,,) is a sequence with the property that there is a number k 
<1 so that |x.) —X)+9| < Alx,, — x,+;|. Such a sequence is called contractive. 
Show that a contractive sequence converges. 

(b) Give an example of a convergent sequence that is not contractive. 

(c) If fis a differentiable function with the property that there is a number & < 
1 such that |f(x)| < k for all x, show that the sequence given by x, = 4; X,4.) = 
J{x,,) is contractive (and hence converges). 


(d) Suppose the hypotheses of (c) are weakened to require only that |f'(x)| < 1 
for all x. Does the result still hold? 


(e) With your calculator set for "radians," and starting with any number, press 
the "cos" key repeatedly. Can you explain this behavior? 


6. Show that Theorem 9.16 fails for Q; that is, find a Cauchy sequence of 
rational numbers whose limit is not rational (since the terms in this sequence 


are also real numbers, the sequence must have a real limit). 


7. Show directly that a bounded, monotone sequence is a Cauchy sequence. 
(Here "directly" means don't use this argument: A bounded monotone 
sequence converges. A convergent sequence is a Cauchy sequence.) 


8. (a) Which of the combinations in the following chart are possible? 


[~~ BOUNDED [~~ BOUNDED 


UNBOUNDED UNBOUNDED 
A | CONVERGENT sequence | CONVERGENT 
(An) DIVERGENT with DIVERGENT 
MONOTONE afan) | MONOTONE subsequence 
(NOT MONOTONE iINOT MONOTONE 
CAUCHY CAUCHY 
NOT CAUCHY NOT CAUCHY 


(b) Which, if any, of the above combinations must happen? 


(c) Are there any combinations above that can't happen? 


9. Why is the Archimedean property stated explicitly in part (e) of the Big 
Theorem but not in Theorem 10.16? 


10.7 CLOSING THE LOOP 


Our proofs of the results in the lower left corner of the Big Picture so far include 
the solid arrows in this diagram: 


BOLZANO-WEIERSTRASS 


THEOREM 


BOLZANO-WEIERSTRASS 
THEOREM FOR SEQUENCES | 


| CAUCHY SEQUENCES 
CONVERGE 


We will now fill in the dashed arrow. Recall that part (e) of the Big Theorem 
says "F has the Archimedean property, and a sequence in F converges to an 
element of F if and only if it is a Cauchy sequence," and part (c) says "F has the 
Archimedean property, and every bounded, infinite subset of F has a cluster 


point." The Archimedean property, again, implies itself, and there is nothing 
more to prove there. We will show that "Cauchy sequences converge" implies 
the Bolzano-Weierstrass theorem. More precisely: 


THEOREM 10.17: Jf F is an ordered field in which every Cauchy sequence 
converges to an element of F, then every bounded, infinite subset of F has a 
cluster point that is an element of F. 


PROOF: Let S be a bounded, infinite set. We must construct a Cauchy sequence 
converging to a cluster point of S. The proof of Theorem 7.5 gives us an idea of 
how to find a cluster point, and this proof is a modification of that one. Say 
SC [a4], Let Sy =S, Iy = [a, 5], and * © °°, Let J, be the left half of Jy if its 
intersection with So is infinite, and the right half if it is not, and let Sree, 
Then S; is infinite and so contains a point, x,, different from xp (x) may not be in 
S;, but #1 € So). We continue in this way (it is a familiar argument), producing 
sets So, S), ..., intervals Jp, /;, ..., and points xp, x, ..., with the following 
properties: 

(i) /, is either the right or left half of /,_,. 

(li) @n © Se for alln =k. 

(iii) Sn S Jn 

and (iv) x, #X,, for all m #n. 


Only statement (11) is not immediate. You will prove it in Exercise 10.7.1. 


Now (1) implies that the infimum of the lengths of the J/,'s is 0. Let e > 0 be given 
and let N be such that the length of Jy is less than e. If m, n > N, we have, by (it) 
and (111), that x,, and x, are both elements of J), and so |x,, — x,| < ¢. Then (x,,) is 
a Cauchy sequence and so converges. Call its limit c. By (iv), (x,) is not 
eventually constant. By Theorem 9.13, c is a cluster point of {x,}, and since 


{tn} © S, cis a cluster point of S. = 


EXERCISES 10.7 


1.Prove statement (i1) in the proof of Theorem 10.17. 


| We will prove the Mean Value theorem in Chapter 12. 


Chapter 11 


Compact Sets 


Nested 
Intervals 
Property 
v7, . 


Heine 
Borel 
Theorem 


11.1 THE EXTREME VALUE THEOREM 


The theoretical side of calculus is mainly a study of continuous functions whose 
domains are closed, bounded intervals. We usually think of the most important 
results (the Intermediate Value theorem and the Extreme Value theorem!) as 
statements about functions, but on a deeper level they have just as much to say 
about the structure of intervals. In this chapter we will examine the property of 
closed, bounded intervals that makes the Extreme Value theorem work. We will 
discuss the Intermediate Value theorem in the next chapter. The Extreme Value 
theorem says: 


If fis a continuous function whose domain is a closed, bounded interval, then f 
assumes a maximum on its domain. 


"Assumes a maximum" means there is an element of the domain of f, say c, with 
fic) = fix) for all x in the domain of f. This is the same as saying sup /(5) € /(S). 
Notice, though, that this is different from saying that the range of fis bounded. 
The function f(x) = x with domain (0, 1), is bounded (we say a function is 
bounded if its range is bounded), but since sup /((9,!)) = | € f(0, 0), f does not 
assume a maximum. Remember the distinction between "maximum" and 
"supremum." 


This simple example also shows that the Extreme Value theorem does not 


hold if the domain of the function is an open interval and suggests there might be 
something more at stake here than just the behavior of functions. What is special 
about closed, bounded intervals? We can get insight into this if we think of other 
types of sets where the Extreme Value theorem holds. We want a theorem that 
says: 


on its domain. 


We see that we can put the words "a finite set" in place of the question marks. 
Any function whose domain is finite is continuous (Exercise 8.7.4). The range of 
such a function is also a finite set, and so it has a largest element. Closed, 
bounded intervals and finite sets share this property, which we give a name. 


DEFINITION 11.1: The set 4 © R is compact if every continuous function /: 
K — R assumes a maximum. 


EXAMPLES 11.1: 1. The Extreme Value theorem says that closed, bounded 
intervals are compact, but we haven't proved this yet (it was "beyond the scope" 
of your calculus course). The proof of the Extreme Value theorem is one of the 
main goals of this chapter. 


2. Finite sets are compact (we have proved this). 


3. The whole real line is not compact. The function f(x) = x is continuous but 
isn't bounded, and so it can't assume a maximum. We can use this same function, 
along with the Archimedean property, to show that N is not compact. 
Compactness is a property of the "large scale" structure of a set, as opposed to 
cluster points, for instance, which describe a set's "small scale" structure. The set 
of natural numbers certainly looks like a finite set when viewed on a small scale, 
but it is not compact. 


4. Let # = {1,9 3:-+-}, Then H is not compact. We can show this by finding one 
continuous function on H that doesn't assume a maximum. Note that f(x) = 1/x is 
continuous on H (f is continuous on any set that doesn't contain 0), but fis not 
bounded on AH (in fact, (fH) =N). 


5. Let 9 = 10.1,953)---} Even though S differs only slightly from the set H in the 
previous example, S is compact. Let fbe a continuous function on S. If (0) = sup 


f(S), we're done (f assumes a maximum at x = 0). Otherwise, let ny be such that 
f(1/np) > f(0), and let ¢ = f(1/np) — (0) > 0. The set 8 = ie © 5S: f(x) 2 (0) +e} is 
finite, since f is continuous at 0, and B is not empty since !/™o © B, The 
maximum of fon S is just its maximum on B, which is attained for an element of 
B. You will check these statements in Exercise 11.1.2. 


It is natural to ask how compactness is related to the usual set operations and the 
properties of sets we already know. We find right off the bat that the proofs of 
some of these results are very straightforward, while others are much more 
elusive. 


THEOREM 11.2: /f A and B are compact, then A B is compact. 


PROOF: Let f : AU B — R be continuous. We wish to show that f assumes a 
maximum on AU B. Since fis continuous on AU B, it is continuous on A and on 
B (Exercise 8.7.3). Since A is compact, f assumes a maximum on A, say M, = 
f(a). Likewise, fassumes a maximum on B, say Mz = f(b). The maximum value 
of fon AUB is max{M,, Mp}, which is assumed at either a or b. = 


We now may use induction to prove: 


COROLLARY 11.3: The union of any finite collection of compact sets is 
compact. 


PROOF: Left as Exercise 11.1.4. = 


It is quite a bit more difficult to prove in this way that the intersection of two 
compact sets is compact since a function can be continuous on an intersection of 
two sets without being continuous on either set. We will prove that intersections 
of compact sets are compact when we have more techniques at our disposal. 
(When we get to it, the proof will be one line long—an illustration of the value 
of waiting for the right tool to come along.) 

Compactness is always a bit of a mystery. Perhaps we can relate it to more 
familiar ideas. Example 11.1.4 works as it does because the set H has a cluster 
point that it does not contain. In other words, H is not closed. Example 11.1.3, on 
the other hand, works because the sets involved are not bounded. This leads us to 
suspect: 


THEOREM 11.4: 4 compact set is closed and bounded. 


PROOF: This theorem is of the form 4 = (B and C), and so we must prove both 
A= Band A= C. We will do both by the contrapositive. First, suppose the set S$ 
is not bounded. The function f(x) = |x| is continuous on the whole real line, and 
therefore it is continuous on any subset of the real line. But if S is not bounded, f 
is not bounded on S, and so S isn't compact. Now suppose the set 7 is not closed. 
Then it has a cluster point, say ¢, that it does not contain. Let f(x) = 1/|x — ¢t|. Then 
f is continuous everywhere except at ¢. But ‘#7, and so fis continuous on 7. 
Since ¢ is a cluster point of 7, there are elements of 7 as close as we like to ¢. 
These may be chosen to give values of f as large as we like. Thus / is not 
bounded on 7; and since such a function exists, T is not compact (compare these 
proofs with Examples 11.1.3 and 11.1.4). = 


EXERCISES 11.1 


1. Give an example of a closed, bounded interval and a (discontinuous) function 
defined on it that assumes no maximum value. 


2. Prove the statements made in Example 11.1.5 (see also Exercise 5.1.7). 
3. Prove in detail that every function whose domain is a finite set is continuous. 
4. Prove Corollary 11.3. 


5. (a) Make the argument in the second part of the proof of Theorem 11.4 more 
precise (the proof as stated says "... we can do this ... we can do that ...") 


(b) Restate the proof of Theorem 11.4 so that it is done directly (not by the 
contrapositive). 

6. (a) Show that the union of an infinite collection of compact sets need not be 
compact. 
(b) Show that it is possible for the union of an infinite collection of compact 
sets to be compact. 


7. Show that the definition of "compact" may be restated with "maximum" 
replaced by "minimum." 


8. Show that the difference of two compact sets need not be compact. 


9. Consider whether the definition of compactness is equivalent to saying just 


that every continuous function defined on a set is bounded. 


10. Notice that, in the definition of compactness, the set K need only be a subset 
of a topological space (but not necessarily R). 


(a) If X is an infinite set with the discrete topology, show that X is not 
compact. (This is a bit of a trick question; be careful.) 


(b) If_X is any set with the indiscrete topology, show that X is compact. 


11.2 THE COVERING PROPERTY 


The converse of Theorem 11.4 would be very useful. Deciding whether a set is 
closed and bounded would seem far easier than deciding whether it is compact. 
Our solution to this problem will lead us into some very abstract mathematics. 
Being abstract often means looking at a problem topologically, that is, finding a 
way to express an idea in terms of open sets. The definition of compactness is 
only partly topological. It involves the (topological) issue of continuous 
functions, but we must also consider the (nontopological) question of the 
ordering of the real line to make sense of "maximums." 


The Covering property is a tricky concept. To get an idea what it is about, let 
us consider two sets we know to be compact (a finite set and the set S in 
Example 11.1.5) and see if they have anything else in common. Our claim that 
"Every function whose domain is a finite set is continuous" is based on bits and 
pieces of other proofs. If we were to assemble a detailed proof of this statement, 
we would see that it hinges on the fact that we can enclose a finite set in a 
collection of open intervals. (That every function on such a set is continuous 
then follows because each point of the set is *open.) 


ag as, (hy ay a4 


Suppose we suspect that we can cover a set with intervals like this, but we don't 
know exactly where these intervals are, how big they ought to be, or how many 
of them we need. In our desire to be sure the set is covered, we might get carried 
away and bury the set in intervals, like this (!): 


2 as ay a3 a4 
Now among these intervals (no matter how many there are and even if there are 
infinitely many), there is at least one that contains a,. We can pick one of these 


out and label it /,. Likewise, there is an interval that contains a,. We can pick 
such an interval out and label it /, (at may even be that a, 1s in J,). We can 


continue in this way and find a finite collection of intervals, chosen from the 
original bunch, that contains the whole set (here there will be no more than five 
intervals chosen). 


What happens if we try this with the set S in Example 11.1.5 (which is 
compact)? If we cover S with open intervals, at least one of them must contain 0. 
By Corollary 6.2.b, this particular interval must also contain another element of 
S, and in fact must contain all but finitely many of the elements of S (be sure you 
see why this is so). Since only finitely many points of S remain uncovered, we 
can pick out finitely many more intervals containing the remaining elements of S 
as in the previous example. 

In contrast, consider the set of natural numbers, which we know is not 
compact. Let J, =(n — 1/2, n + 1/2) for n = 1,2, .... These intervals cover N, but 
if we remove even a single one, those that remain will no longer cover N (if we 
remove /537, for instance, n = 237 will no longer be contained in any of the 
intervals). Certainly no finite collection of these intervals can cover N. We have 
found a property, having something to do with "covering" sets with open 
intervals, that is shared by the two sets we know are compact but not possessed 
by a set we know is not compact. To make our definition purely topological, we 
replace "open intervals" with "open sets": 


DEFINITION 11.5: A collection of open sets {Va + @ © A} is an open cover of 
the set Sif 5 #5 [Use Uo- {Ua}, If {U,} is an open cover of S, we say that 
we have covered S with {U,}. 


Remember that in using a@ as a subscript, we are making no commitment as to 
the cardinality of the index set A. If we were to write {U,,} or {U;}, we might 
think that A has to be countable, which may not be the case. 

Let us put our observation about finite sets in terms of Definition 11.5. Let A 
= {a}, a, ..., a,} and suppose {U,} is an open cover of A. Since “1 € 4, it must 
be that a, is in one of the sets U,. Call this set “o.. Similarly, we may find sets 
Uasy+++»Uo, in {Ua} in {U,} containing ay, ..., a,, respectively. Even though 
{U,,} may have infinitely many sets in it, we need only finitely many of them to 
cover A. The collection (Ve:;---;Ua,} is called a subcover (this particular one is 
a finite subcover). It is extremely important to recognize what has happened 


here. We have NOT said: 
| nNG | NO tf NO — NO 
We can cover A with finitely many open sets. 
tf NO t NO t NO Tf NO 1 
But we HAVE said: 
| YES | YES | YES | YES | 


We can cover A with finitely many sets chosen from {U,}. 
ft YES t YES t YES ft YES fT 


At first glance, these statements may not seem very different. Upon reflection, 
though, we find that the first of them doesn't say anything useful at all. The 
whole real line is an open set, and so we can cover anything with finitely many 
open sets (we can do it with just one: the whole real line). Producing a finite 
open cover is not a challenge. It becomes one only when we must cover our set 
with finitely many sets chosen from a previously specified collection. 


DEFINITION 11.6: A set has the covering property if any open cover of it has 
a finite subcover. 


Note that "finite" here refers to the number of sets in the subcover, not to the 
cardinalities of the individual sets. 


EXERCISES 11.2 


1. Show directly that the union of two sets having the covering property has the 
covering property. 


2. In the discussion preceding the definition of an open cover, why is it 
necessary to replace "open intervals" with "open sets" to make the definition 
purely topological? 


3. (a) Show that a closed, bounded set having exactly one cluster point has the 
covering property. 


(b) Show that "closed" and "bounded" are both necessary to make the 
statement in (a) true. 


(c) Show that a closed, bounded set having finitely many cluster points has 
the covering property. 


(d) In (c) you showed that if S' is closed and bounded and 5S" is finite, then S 
has the covering property. Now show that if S' is closed and bounded and S” 
is finite, then S has the covering property. 


. Show that a set with the covering property is compact in this way: If the set 
is not compact, there is a continuous function f defined on it that attains no 
maximum. Hence, if f(x) is any value of f, there is a number y in the set with 
Ky) > fx). Construct an open cover of the set with no finite subcover. 


. (a) Show that a set of the form [a, ©) is not compact by displaying a 
continuous function on it having no maximum. 


(b) Show that a set of the form [a, 0) does not have the covering property by 
displaying an open cover of it having no finite subcover. 


. (a) Show that the set H in Example 11.1.4 does not have the covering 
property. 

(b) Supply the details of the proof that the set S in Example 11.1.5 has the 
covering property. 


. (a) Suppose S is an infinite set having the covering property and that {U,} is 
an open cover of S. Show that there is an a’ so that Va- 9S is infinite. 


(b) Give an example of a set and a cover, as in (a), such that there is only one 
such set Va. 

(c) Give an example of a set S and an open cover {U,} such that Val is 
infinite for all a, but S does not have the covering property. 


. (a) Show that an infinite set with the discrete topology fails to have the 
covering property. 


(b) Show that any set with the indiscrete topology has the covering property. 


. Consider the set consisting of the real numbers and another symbol, say ©. 
This set is denoted RU{%}, We say that a subset of RU{%} is a 
neighborhood of © if it contains the complement of a compact set. 
Neighborhoods of the other elements of the set are defined in the usual way. 


(a) Show that a subset of RU {°°} that does not contain oo is open if and only 


if it is an open subset of R (in the usual sense). 


(b) State and prove a condition that says when a subset of RU {°°} that does 
contain 00 is open. 


(c) Show that the open subsets of RU {°°} form a topology. 

(d) Show that a function: f : RU {oo} — R is continuous if (i) it is continuous 
on R in the usual sense, and (ii) fo) = lim,_,,, f(x) = lim,_,_,, f(x) (where 
both of these limits are taken in the sense of ordinary calculus). 

(e) Show that f : RU {00} — RU {00} given by 


l/e +0 
f(z)=4o0 «c£=0 


0 r oe 


is continuous, while 9 : RU {00} ~ RU {0} given by 


is not. 
(f) Show that RU {°°} is compact. 
(g) Show that BU {°°} has the covering property. 


(h) The set RU {°°} is called the one-point compactification of R. Show that 
this procedure works for any noncompact topological space. That is, if X is 
such a space, consider the set *{%}, where a subset is considered a 
neighborhood of © if it contains the complement of a compact subset of X. 
Show that ¥ U {°°} is a compact topological space. 


(1) What happens if you construct the one-point compactification of a set that 
is already compact? 


10. Now consider the real numbers together with two new symbols, say +co and 
—oo (these are just two symbols, the signs don't indicate arithmetic 
operations). We will say that a set is a neighborhood of +co if it contains the 
complement of a closed set that is bounded above, and that a set is a 
neighborhood of —co if it contains the complement of a closed set that is 
bounded below. 


(a) Show that a subset of RU {—°°,+°°} that does not contain —0o or +co is 
open if and only if it is an open subset of R (in the usual sense). 


(b) State and prove a condition that determines whether a subset of 
RU {—oo, +00} that does contain —co or +00 is open. 


(c) Show that the open subsets of RU {—°°, +°} form a topology. 
(d) Show that a function f:RU{-°°, +00} +R jg continuous if (i) it is 
continuous on R in the usual sense, (11) f(—00) = lim,_,_., f(x), and (111) fo) = 
lim,_,.. f(x). 
(ce) Show that 9: RU {—00, +00} — RU {—00, +00} given by 

g(x) = 


co r=0CC 
0 rt= — oo 


{‘ x # —00, co 


is continuous. 


a 
(f) Can the function f : RU {-00, +00} + RU {—00, +00} given by lS for 
x € R be defined at 0, —00, and +00 in such a way as to make it continuous? 


(g) Show that ® U {—00, +00} ig compact. 
(h) Show that RU {—°°, +90} has the covering property. 


(i) RU {—00, +00} is the two-point compactification of R. This procedure 
does not make sense in the general topological setting. Why? 


(j) Show that any rational function on the real numbers can be extended in 


such a way that it is continuous as a function from RU {-9°,°%} to 
RU {—00, 00} 


11.3 THE HEINE-BOREL THEOREM 


In our last example above, we saw that the set of natural numbers does not have 
the covering property. Here is another example: Let / = (0, 1) and let U,, = (1/n, 
2) forn=1,2,.... Then! © U, Un, but no finite collection of the sets U,, covers / 
(be sure you see why this is so). In these examples we can see the essence of the 
theorem we seek, called the Heine-Borel Theorem, which says: A set is closed 
and bounded if and only if it has the covering property. More precisely, we will 
show: 


THEOREM 11.7: Jf F is an Archimedean ordered field having the Nested 
Intervals property, then the following is also true: A subset of F is closed and 


bounded if and only if it has the covering property. 


This theorem has the most complicated logical structure of any we have seen, 
and so we will examine it closely before we begin the proof. The theorem is of 
the form (4 = B) = (C = D), where 

A: {J,} is anest of closed, bounded intervals 

B:-Mnin #0 

C: The set S'is closed and bounded 

D.: Shas the covering property 
Since there is an "and" in the conclusion (where?), the theorem has two parts: 


(A> B)>(D=>C) and (A> B)>(C=> D), 
which (by Exercise 1.4.6.a) are in turn equivalent to 


((A > B) and D) > C and ((A=> B) and C) => D. 


We must prove: (1) If the Nested Intervals property holds (4 = B) and S has the 
covering property (D), then S is closed and bounded (C). (This one is easy.) 


(2) If the Nested Intervals property holds and the set S is closed and bounded, 
then S has the covering property. (This is much more significant.) 


PROOFS: (1) Suppose that S has the covering property and _ let 
Un =(~n,n) forn € N, Then {U,} is a cover of S (it is a cover of the whole 


line), and since S has the covering property some finite collection of these sets 
must cover S. The union of any finite collection of the sets U,, is bounded (the 


union is just the one with the largest subscript), and so S is contained in a 
bounded set and S'is bounded. 


Now suppose S has the covering property and P ¢ 5. We will show that p is 
not a cluster point of S (this will tell us that S is closed). Let 


Un = R\|p—1/n,p+1/n| forn€N, Then U, is open for each n and 
8 U,, — R\{p} (why?) 


Since P € 5; {Un} is a cover of S. Since S has the covering property, there is a 
finite collection of the sets {U,,} that covers S. One of the sets in this collection 


will have the largest subscript, call it mg. The union of this finite collection 
contains S but contains no element of [p-—l/np, ptl/no]. Hence 
(p ~ 1/no,p + 1/no) 5 = 0, and p is not a cluster point of S. 


(2) This classic proof proceeds by contradiction. Suppose S is a closed, bounded 
set and that {U,,} is an open cover of S having no finite subcover. We will say 


that a subset 7 of S has "property B" (for "bad") if no finite subcollection of 
{U,} covers T. Note that S itself has B. It is important to understand property B 


clearly. In order for T to have B, it must be impossible to cover 7 with finitely 
many sets chosen from {U,}. 

Since S'is bounded, it may be contained in a closed, bounded interval, say Jp. 
Let So = 5 = 10S, Divide J, in the middle and call the left half 7; and the right 
half Zp. One or the other (or both) of S0%/z or S0%/r must have B. (if neither of 
them had B, then Sy would not have B). Let J, be J; if So/z has B. and Ip 
otherwise, and let 51 = 41S, Everything we have said about Sp is true of S,, and 
SO we may repeat this process to obtain sets S5, S3, ..., and a nest of closed, 
bounded intervals {/,}. The length of each interval J, is half that of the previous 
one, and so the infimum of their lengths is 0. By the Nested Intervals property, 


there is a real number s so that [n/» = {s}. Now s is a cluster point of S (this 
follows as in the proof of the Bolzano-Weierstrass theorem—notice that a set 
with property B. must be infinite). Since S is closed, we have s € S. Thus there is 
an element of the cover, say Va-, with s © Va-. Now UVa- is open, and so there is an 
é > 0 with (s~©.8 +) © Vor, The infimum of the lengths of the intervals J, is 0, 
and so there is one, say /n-, with length less than ¢ (in fact, all but finitely many 
of them have this property). Then Sn- & /n- S (s —€,8 +£) S Vos, contradicting the 
way the sets S, were chosen (5n- fails to have property B. in a big way since it 
can be covered by one set from {U,}). = 


So far we have proved implications like this: 


S is compact 
\ 
S is closed and bounded 
} 4 


S has the covering property 


(with the bottom J depending on the Nested Intervals property). Now we will 
show that the various properties are equivalent by showing that the covering 
property implies compactness. 


We will first prove something called a "preservation theorem." A 
preservation theorem is one that says "Ifa set has This property and you do That 
to it, the result will also have This property" (This is preserved under That). 


THEOREM 11.8: /f the set S has the covering property and f: S — R is 
continuous, then f(S) has the covering property. 


PROOF: We will use the forward-backward method (in this proof there is a lot 
of "forward" and not much "backward"). It is f(S) we wish to show has the 
covering property, and so our proof should begin and end like this: 


Let {U, } be an open cover ‘of AS). 


Then (Way :k=1,...,m} is a cover er ofS). 


We now have open sets in the range of the continuous function f. This collection 
of words together suggests strongly that we apply the definition of continuity: 


Let U, } ee an open cover ‘of rf(S). 
+ Consider the open sets {f-1(U,)}. 


* * * 


Then (Ua, :k=1,..-, mri is a cover ot Of S)._ 


All we know about S is that it is contained in the domain of f and has the 
covering property. Now {f!(U,)} is a collection of open sets contained in the 


domain of f. We should check whether {f'(U,)} is a cover of S. 


Let {U, A ee an open cover ‘of AS). 
Consider the open sets {f '(U,)}. 


— >» LetseS. 
—— Then /(s) © /(S) [by definition of ((S)]. 


—» So f(s) © Vac for some ag [because {U,} is a cover of f(S)] and © f-'(Uao) 
[by definition of f-'(Uaol}. 


+ Hence {f!(U,)} is an open cover of S. 
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Let t {U, \ be an open cover oF AS). 
Consider the open sets {f '(U,)}. 


LetseS. 
nee he € =a (5) [by definition of f(S)]. 
So /(s) © Va for some ap [because {U,,} is a cover of f(S)] and § © f~*(Wa0) [by 


definition of /’(Ua«)]. 
Hence {f '(U,)} is an open cover of S. 


— » Since S has the covering property, there is a finite subcover 
CP awd 


Let +t {U, 4 ve an open cover cor AS). 
Consider the open sets {f-'(U,)}. 


Lets € S. 
Then /(s) © /(5) [by definition of ((S)]. 
So /(s) © Van for some ay [because {U,} is a cover of f(S)] and § © f~' (Ua) [by 


definition of f-'(Vo0)]. 
Hence {f !(U,)} is an open cover of S. 


Since Shas ee covering property, there is a _ finite subcover 
Age | ee Cd 


a» Let # € f(S). 

~—» There is an element s of S with t = f(s). 

—. s is in one of the sets f-'(Ua:);-++ or f-" (Ua), 
—. So ¢ is in the corresponding set Vai+-+ + OF Vow 


Several of the exercises in Section 11.2 hint at the following theorem. 
THEOREM 11.9: Jf 'S has the covering property, it is compact. 


PROOF: We must show that any continuous, real-valued function whose 
domain is S attains a maximum. Since S has the covering property, /(S) has the 
covering property, and so f(S) is closed and bounded. We revise a familiar 
argument: Since f(S) is bounded, there exist ay, by such that /(5) S |a0,b0) = Lo, 
Divide Jp in half in the middle. Let /, be the right half of J if that half contains 
any element of f(S), and the left half otherwise. Continue in this manner, 
producing a nest of closed, bounded intervals with the properties: (1) 
(1) nS ¥ 9 for all n, and (2) If L, is the left half of Z,_,, the right half of J,,_, 


contains no element of /(S). Let {u} = n/m You will show in Exercise 11.3.2 


that u = sup S.(7) Since f(S) is closed, it contains its supremum (Exercise 8.3.1); 
that is, sup /(5) © f(5). Thus Sis compact. = 


It is now a simple thing to extend Theorem 11.2 to intersections: 


COROLLARY 11.10: The intersection of any collection of compact sets is 
compact. 


PROOF: The intersection of any collection of closed, bounded sets is closed 
and bounded. (This is a corollary to the whole circle of results, not just to 
Theorem 11.9.) = 


We have three equivalent characterizations of compactness: the definition, the 
covering property, and "closed and bounded." We may use "compact" to refer to 
any one of them. /¢ is traditional to take the covering property as the definition 
of compactness since it is more "topological." If we do this, the Heine-Borel 
theorem also assumes its traditional form: 


A set is compact if and only if it is closed and bounded. 


Since the covering property is equivalent to compactness, we have already 
established the following preservation theorem. The direct proof of it is still 
instructive in its simplicity. Be sure you see why this proof begins the way it 
does. 


THEOREM 11.11: Jf the set S is compact and f : S > R is continuous, then f(S) 
is compact. 


PROOF: Let g : f\S) — R be continuous. Then g.f: S — R is continuous and, 
since S' is compact, g . f attains a maximum value, say g(f(s)). Since f maps S 
onto f(S), there are no inputs to g that are not of the form f(x), and g can assume 
no value larger than g(f(s)). Thus g attains a maximum at f(s), and /f(S) is 
compact. = 


EXERCISES 11.3 


1. Verify the statement in the proof of Theorem 11.7 that a set with property B 
must be infinite. 


2. Show that the point u obtained in the proof of Theorem 11.9 is in fact sup S. 
(In doing this, you will show directly that the Nested Intervals property 
implies the Least Upper Bound property.) 


3. (a) Complete the proof of the example preceding Theorem 11.7. 


(b) Where is the Archimedean property used in the proof of the Heine-Borel 
theorem? 


4. (a) Suppose S is closed and bounded and T is a closed subset of S. Show that 
T is closed and bounded. 


(b) Suppose S has the covering property and 7’ is a closed subset of S. Show 
directly that T has the covering property. 


(c) Suppose S is compact and Tis a closed subset of S. Show directly that Tis 
compact. (This is very difficult.) Parts (a), (b), and (c) all say the same thing. 
Which was easiest? 


5. (a) Construct an example of an open cover of the whole real line that has 
uncountably many sets in it and has no finite subcover. 


(b) Show that, for any natural number 7, the interval [—n, n] can be covered 


by finitely many of the sets you found in (a). 


(c) Show that the real line can be covered by countably many of the sets you 
found in (a). 

(d) Show that the result of (c) is true in general: Any open cover of R can be 
reduced to a countable subcover. [This is not quite as useful as if R were 
compact, but it does say something (however mysterious) about its topology. 
A topological space with the property that any open cover can be reduced to a 
countable subcover is called a Lindel6f space. 


6. (a) Show that the set made up of the range of a convergent sequence together 
with its limit is closed and bounded. 


(b) Show directly that the set made up of the range of a convergent sequence 
together with its limit has the covering property. 


(c) Give an example of a divergent sequence whose range is compact. 


7. Suppose K is compact, {U,,} is an open cover of K with finite subcover (Va. } 
, and « € K, Let &« = sup{e : (w—e,a+e) C UVa, for some n}. 
(a) Show that ¢, > 0 for all 2 € K, 
(b) Argue that the function f: K — R defined by f(x) = ¢, is continuous. 


(c) Show that f(x) assumes a positive minimum on K. 


(d) Show that (c) can interpreted in this way: For any open cover of a 
compact set K, there is a positive number 6 so that any interval of length less 
than 6 and containing a point of K is contained in a single element of the 
cover. This result is called the Lebesgue Number Lemma, and oO is called 
the Lebesgue Number of the cover. We have used a good bit of our 
knowledge of compact sets to do this proof. A direct proof is more difficult. 


8. Show that a set is compact if and only if any sequence contained in it has a 
subsequence that converges to an element of the set. 


9. Show that an analogue of the Nested Intervals property holds for compact 


sets, that is, if 41 2 A22-.. is a nest of nonempty compact sets, then 


ar Ky, 7 0 


10. Show that an infinite compact set must have a cluster point that is in the set. 


11. (a) Show that a nonempty compact set has and contains a supremum and 
infimum. 


(b) If your proof for (a) began with the phrase "Since the set is compact, it is 
closed and bounded ...," do another proof based only on the definition of 
compactness. If you used the definition to do part (a), do another proof in 
which you begin by noting that a compact set is closed and bounded. 


11.4 CLOSING THE LOOP 


THEOREM 11.12: An ordered field in which the Heine-Borel theorem holds 
also has the Archimedean property and the Nested Intervals property. 


PROOF: The Archimedean property is the same as the statement that N is 
unbounded. We have seen that N fails to have the covering property. By the 
Heine-Borel theorem, this means that N is either not closed or not bounded. But 
we know that N is closed, and so it must be that N is unbounded. 

Now suppose the Nested Intervals property fails. Then there is a nest of 
closed, bounded intervals, {7,}, with Mn/n = 0. Let Un = R\Jn, Let U, = RVI, for 
each n. Then {U,,} is an open cover of the whole real line. (This is the only 
tricky part of the proof—the points not in Un Un U, are the points in On /n, a 
set we have assumed to be empty!) So {U,,} is an open cover of, among other 
things, /;, and J, is closed and bounded. Any finite subcollection of {U,,} will 
have a set with the largest subscript, say np. The union of the finite collection is 
just Uno. The midpoint of /no, an element of J, is not contained in Uno. Hence J, is 
not covered by any finite subcollection of {U,,}, and so it is closed and bounded 
but not compact, a contradiction. = 


Here again we see that the negation of the conclusion gives us specific useful 
information, making the proof by contradiction a good bet. We did not use the 
full power of the Heine-Borel theorem here. We needed to know only that a 
closed, bounded interval has the covering property. You will show this, in a 
sense, in Exercise 11.4.2, but remember that in Exercise 11.3.7 you used the 
Heine-Borel theorem to obtain the Lebesgue number lemma. Here is a case 
where a very general result is easier to obtain than a specific one, a situation that 
is not as rare as one might expect. Theories are organized (in hindsight) to 
include only those ideas crucial to the discussion. Compactness is essentially 
topological, and to deal with intervals (and the attendant order structure) tends to 


obscure the real issues. 


Finally, note that the lower right corner of the Big Picture is only an 


abbreviation for what has actually happened. Our proof has really gone like this: 
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EXERCISES 11.4 


1. 


(a) Construct a closed, bounded subset of the field of formal rational 
functions that does not have the covering property. 


(b) If F is any non-Archimedean ordered field, must there necessarily be a 
closed, bounded subset of F that does not have the covering property? 


. (a) Find a closed, bounded subset S$ of Q and a function f: S — R such that / 


is continuous but does not attain a maximum on S. (Note that a subset of Q is 
closed if it is *closed, that is, if it is the intersection of a closed set with Q.) 


(b) Find a closed, bounded subset S of Q and an open cover of S with no 
finite subcover. 


. Recall the definition of "metric" given in Exercises 4.7.4. 


(a) Let X= P(R), the power set of R. For A, 8 © X,, we define the "distance" 
between two sets to be “(4; 8) = inf{la— 6): @ © A,bE B}. Ts da metric on X? 
Does d satisfy any of the conditions for being a metric? 

(b) Give a condition that will guarantee that d(A, B) = 0 whether or not A = 
B. 

(c) Show that the function d(A, B) can be equal to 0 even if A and B are 
disjoint. 

(d) Show that if A and B are disjoint and compact, then d(A, B) > 0. 


(e) Is da metric on the collection of compact sets? 


4. Use the Lebesgue Number lemma to show that a closed, bounded interval 
has the covering property. (You will need the Lebesgue Number lemma to do 
this. If you think you have done a proof without it, examine your argument 
very carefully.) 


! There is a discussion of why these are the most important results in calculus at the end of the next 
chapter. 


2 The previous few lines are an outline of the proof that the Nested Intervals property implies the Least 
Upper Bound property. We might have simply said "S is bounded above, and so it has a supremum," but we 
must do the proof as we have in view of the theorem's position in the Big Picture. 
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12.1 THE INTERMEDIATE VALUE THEOREM 


Consideration of the Extreme Value theorem led us in the last chapter to the idea 
of compactness. Now we will think about the Intermediate Value theorem in the 
same way. The Intermediate Value theorem says: 


If fis a continuous, real-valued function defined on an interval, and f takes on a 
positive value at some point a and a negative value at some point b, then there is 
a point c between a and b where f(c) = 0. 


(Sometimes a and b are required to be the endpoints of the interval, but it is not 
necessary. This form of the theorem is more useful for our purposes.) 


Like the Extreme Value theorem, the Intermediate Value theorem has much 
to tell us about both intervals and functions. We will say that a function has the 
intermediate value property on a set if it takes on the value 0 somewhere in the 
set between any two points where its values have opposite signs. Evidently, this 
depends on both the function and the set. The Intermediate Value theorem says 
that any continuous function has the intermediate value property on any interval. 


When we began our examination of the Extreme Value theorem, we looked 
for examples of sets other than intervals for which the conclusion of the theorem 
held. This is not a good way to begin discussion of the Intermediate Value 
property, for reasons that will soon be apparent. Instead, let's think of examples 
where the conclusion does not hold. 


EXAMPLES 12.1: 1. Let A = {0, 1}, and let fbe given by (0) = —1 and f(1) = 
1. Then fis continuous on A (Exercise 8.7.4), but there is no x € A where f(x) = 0. 


2. Let B= 10 U (2,3) ang let f = —1 on [0, 1] and 1 on [2, 3]. Again, f is 
continuous and changes sign on B, but there is no x € B where f(x) = 0. 


Each of these sets has, roughly speaking, a "hole" that allows the function to 
change sign without ever being 0. In the next two examples we see that the hole 
doesn't have to be very big. 


3. Let © = (0, 1)UU,2) and f(x) = x — 1. Then fis continuous, negative at x = 1/2, 
and positive at x = 3/2, but isn't zero anywhere in C. 


4. Let f: Q > R be given by f(x) = x* — 2. Then fis continuous, it is negative at 
x = 0 and positive at x = 2, but fis not zero at any element of its domain. 


The last example suggests some of the significance of the Intermediate Value 
theorem. If we are allowed to think only of rational numbers, we can't even solve 
a simple equation like x7 — 2 = 0. If we can't solve equations, there is little point 
in doing mathematics. The Intermediate Value theorem tells us that certain 
equations have solutions and narrows down where those solutions might be. 

If we think of the sets in these examples as bits of string on the number line, 
there are places where we can take hold of the sets and pull them apart without 
having to break anything. These sets seem to be "disconnected." Hence our 
definition: 


DEFINITION 12.1: A set C is connected if every continuous, real-valued 
function f: C — R has the Intermediate Value property on C. 


We have seen examples of sets that are not connected. Here are some connected 
sets: 


EXAMPLES 12.1: 5. The Intermediate Value theorem (which we have yet to 
prove!) says that intervals are connected. 


6. A set with a single point is connected. Any function on such a set is 
continuous and has the Intermediate Value property. (It is impossible for the 
"values" of the function to have opposite signs.) 


The following theorem tells us that examples of connected sets can't be any more 
complicated than this. 


THEOREM 12.2: Jf S is a connected subset of the real line, then S is an 
interval. 


PROOF: Recall that there are nine types of intervals: (a, b), [a, b], (a, 5], [a, 5), 
(a, ©), [a, 20), (—o0, b), (—o0, b], and (—o0, 00), where a and b are real numbers. We 
can select one of these by specifying whether it is bounded above or below and 
whether it contains its infimum or supremum. Here we will prove that a 
bounded, connected subset of the real line that contains neither its infimum nor 
its supremum is of the form (a, b) for some real numbers a and b. (You are 
invited to state and prove the other eight parts of the theorem.) Let S be such a 
set. Since S is bounded, it has an infimum and a supremum, say a and b. We 


want to show that S = (a, b). This is a set-equality problem, and so we know 


. z € (a,b) : ; 
where we must begin: Let \“°) Since a < z < b, the properties of the 


infimum and supremum guarantee there are elements x, and x, of S with a < x, 
<z <x, <b. Now let fix) =x — z. Then fis continuous, and takes on a negative 


value at t: © S and a positive value at *? © °. Since S is connected, there is a 
point ¢ € S with f(t) = 0. But f(x) = 0 only for x =z. Hence 2 =t € S, and (%) © 9, 
By the way a and b were chosen, we know that ° S 2.6) But we are assuming 
that S contains neither a nor b, meaning that 5 S (¢.5), and so S = (a, b). = 


Here we used the Intermediate Value property not to show that some point 
existed (we already knew z existed), but to say where it was. When you've 
completed the other eight parts of the proof, you will know that the only 
connected subsets of the real line are intervals. The Intermediate Value theorem, 
conversely, says that all intervals are connected, but its proof must wait. Notice 
that Theorem 12.2 leaves open the possibility that, except for sets with a single 
point, there are no connected subsets of the real line at all. This may seem very 
odd, but we will show shortly that the only connected subsets of the rational 
numbers are sets with just one element. 


EXERCISES 12.1 


1. Complete the proof of Theorem 12.2. 


2. Explain why Theorem 12.2 leaves open the possibility that there is no 


connected set consisting of more than one point. 


3. Show that a set S is connected if and only if it has the property that whenever 
x <z<yandx and y are elements of S, then z is an element of S. 


4. (a) If the topological space X has the indiscrete topology, show that every 
subset of X is connected. 


(b) If X has the discrete topology, show that the only connected subsets of X 
are those having only one element. 


12.2 DISCONNECTIONS 


If a set S is not connected (we say disconnected), there is a continuous, real- 
valued function on S, say g, that takes on a negative value at some point a € $ 
and a positive value at some point » € S but is not 0 in S anywhere between a 
and b. By making the following adjustment, we may assume that in this case 
there is a function with domain S that never takes on the value 0. Suppose g, a, 
and b are as above and that a < b. Let 


gia) r<a 
f(z)=4 g(t) asar<b 
g(b) «>b. 
Then fis continuous on the whole real line, fails to have the Intermediate Value 


property, and never takes on the value 0. Consider the sets A = f-!((—o0, 0)) and 
B = f'((0, «)). Observe that: 


(1) A and B are both open since (—, 0) and (0, «) are open and fis continuous. 


(2) A and B are disjoint since if « € A then f(x) < 0, while if « < B then f(x) > 0, 
and f(x) can't be both positive and negative. 


(3) A and B each contain at least one point of S since f(a) < 0 [so @ © 4] and f(b) 
>0[sobe BI. 


(4) SC AUB since AUB contains all points except those where f(x) = 0, and we 
are assuming there are none of these. 


We can assemble these observations into a theorem. 


THEOREM 12.3: The set 5 © R is disconnected if and only if there are open 
sets A and B such that: 


(i) A and B are disjoint 
(ii) ANS ADV and BOS #0 
and (iii) $C AUB 


If this is the case, we say A and B are a disconnection of S. Note that (11) implies 
that neither 4 nor B is empty. 


PROOF: We have essentially completed the "only if" part of this proof in the 
discussion above. Suppose two such sets can be found. Let a¢ ANS and 
b © BOS. We may assume a < b. Since A is an open set, it is a union of open 


a € (c,d) 


intervals (Theorem 8.11). Suppose , one of the component intervals of 


A. Now d < b since » ¢ A, In our proof of Theorem 8.11, we saw that “ * 4%, 


and so “ 5, The function f(x) = x — d is continuous on S, f(a) < 0 and f(b) > 0, 
but f(x) never takes on the value 0 on S. Thus f fails to have the Intermediate 
Value property on S. Since such a function exists, S is disconnected. = 


We have made free use of the Least Upper Bound property in these proofs, but 
this is acceptable when we consider the position of connectedness in the Big 
Picture. The idea of disconnectedness will be as useful to us as connectedness. 
While connectedness is entangled with the order structures of the real line (with 
its references to intervals and uses of the words "positive," "negative," and 
"between"), disconnectedness is purely topological. One effect of this will be 
that many of our proofs concerning connectedness will be done by contradiction 
or contrapositive. This has often been the case when the negation of a definition 
gives us better information than the definition itself. While the definition of 
connectedness tells us something mysterious about continuous functions, 
Theorem 12.3 gives us something much more tangible (the sets A and B). 


EXERCISES 12.2 


1. Show that a set is disconnected if and only if it has a proper nonempty subset 
that is both “open and “closed. 


2. Show that the field of formal rational functions is disconnected. 


3. Show that the function f, defined in the section by 


g(a) x<a 
fiz)=<¢ gx) as2z<so 
g{b) r>b 


has the properties claimed for it. 
12.3 THE BIG THEOREM SAILS INTO THE SUNSET 


THEOREM 12.4: An ordered field with the Least Upper Bound property is 
connected. 


PROOF: Let f be a continuous function from such a field to the real numbers, 
and let a and b be such that a < b and f(a) < 0 < f(b). The set P = f-!((0, «)) is 
open, as is the set N = f!((—c0, 0)). Now V(~°°. 6) is nonempty (it contains a) 
and is bounded above (by b), and so it has a supremum. Let ¢ = sup(-V % (~00,5)), 
Note that a <c <b. What is f(c)? It can't be positive. Since P is open, if c were in 


P, there would be an ¢ > 0 with y © P for ¥ © ‘°~ © °) and such a number y 
would be an upper bound for  (~°°-) less than c. In particular, c # b since 
be P 


. Likewise, f(c) can't be negative since then c would be in “"‘~°°:), and 


an open set doesn't contain its supremum. So c # a. We have shown that a < c < 
b and that f(c) = 0. = 


COROLLARY 12.5: (The Intermediate Value Theorem) J/ntervals in the real 
line are connected. 


PROOF: With some manipulation, this is essentially the same as the previous 


proof. If fis a continuous function that fails to have the Intermediate Value 
property on an interval, one can construct from it a disconnection of all of R. = 


EXERCISES 12.3 


1. Complete the proof of Corollary 12.5. 


2. (a) Use the Intermediate Value theorem to show that a positive number a has 


all possible nth roots (that is, the equation x” — a = 0 has a solution). Where 
does your argument fail if a is negative? 


(b) Show that any polynomial of odd degree has a real root (that is, there is a 
real number c so that f(c) = 0). 


. (a) Let f: [0, 1] — [0, 1] be continuous. Show that there must be a number x 
for which f(x) = x. (That is, such a function must have a fixed point.) 


(b) Show that this is not true if the function is considered only on the rational 
numbers. 


(c) Show that the result in (a) is not true if the domain and range of the 
function are (0, 1) instead of [0, 1]. 


. Suppose f: [a, b] — R and g: [a, b] — R are both continuous and that f(a) < 
g(a) and f(b) > g(h). 

(a) Draw a picture illustrating this situation. 

(b) Show that there must be a number x such that f(x) = g(x). 


(c) Use this to answer part (a) of the previous exercise. 


. (a) Suppose f: [0, 2] — R is continuous and f(0) = f(2). Show that there are 
elements of [0, 2], say a and b, such that |a — b| = 1 and f(a) = f(b). 


(b) Suppose fi C — R is continuous, where C is the circle x7 + y* = 1. (We 
have not defined continuity for such a function precisely, but don't worry 
about that now.) Show that there are points a and b on C that are 


diametrically opposed such that f(a) = f(b).(') 


12.4 CLOSING THE LOOP 


This time we will "close the loop" (and thereby complete the proof of the Big 
Theorem) before moving on to other considerations. 


THEOREM 12.6: Any connected ordered field has the Least Upper Bound 
property. 


PROOF: Suppose the Least Upper Bound property fails. This means there is a 
nonempty set S that is bounded above but has no supremum. Let 


-{t: ase S3(t<s)} and B = R\4. We will show that A and B are a 


disconnection of R. Since S is not empty and is bounded above, neither A nor B 
is empty. Suppose x € A and s € S with x < s (this is how x gets to be in A) and 


lete =s—x.Ifx—e<y<x te, theny © A. Thus A is open. Now suppose * © ? 


. Then x is an upper bound for S (since no element of S is greater than x), but x is 
not the least upper bound of S (since S has no least upper bound). Thus there is 
an ¢ > 0 such that x — € is an upper bound for S. If y is such that x —e<y<x+t 


e, we have” © ”, and therefore B is open. Since B is the complement of A, they 
are disjoint and their union is the whole field. So A and B are a disconnection of 
the field. = 


12.5 CONTINUOUS FUNCTIONS AND INTERVALS 


Here is another preservation theorem: 


THEOREM 12.7: Jf S is connected and f : S — R is continuous, then f(S) is 
connected. 


PROOF: We will use the forward-backward method, a proof by contrapositive, 
and the characterization of disconnectedness in Theorem 12.3. 


anos AS) iS cane 


* * * 


Then Si iS disconnected. 


Suppose /(S) is disconnected. 
~_. There are open sets A and B satisfying Theorem 12.3 relative to f(S). 


~.. There are open sets C and D satisfying Theorem 12.3 relative to S. Then S is 
disconnected. 


In ian cond ine we see open sets in the range a: a continuous function. This 
pea SUBECeS our next —— 


Suppose /(S) is disconnected. 
There are open sets A and B satisfying Theorem 12.3 relative to /(S). 
~—» Consider the sets f-!(A) and f1(B). 


* * * 


There are open sets C and D satisfying Theorem 12.3 relative to S. Then S' is 
disconnected. 


Suppose /(S) is disconnected. 
There are open sets A and B satisfying Theorem 12.3 relative to /(S). 


Consider the sets f!(A) and f!(B). 
cap fr (A) and fr (B) are ee [by definition of continuity]. 


[since # = f~'(0) = f-'(AN B) = f-'(A)N f-}(B)]. 
—+ SO f-'(A)#@and Sn f-'(B) #0 
[since f(S$)N A#@ and f(S)N BF). 
—+ SC f-(f(S)) € f-(AUB) = f-"(A) U f-(B). 
» Let C = f\(A) and D = f '(B). 
There are open sets C and D satisfying Theorem 12.3 relative to S. 
yy 2 iS pole one’ : 


By putting together the two preservation theorems we've proved, we find: 


THEOREM 12.8: //f: [a, b] — R is continuous, there exist real numbers c and 
d so that fifa, b]) =[c, d]. 


PROOF: Since [a, 5] is compact, fi[a, b]) is compact. Thus f([a, b]) is closed 
and bounded. Since [a, 5] is connected, f([a, b]) is connected, and so it is an 
interval. = 


Remember Theorem 12.8 as The continuous image of a closed, bounded interval 
is a closed, bounded interval—this has both the Extreme Value theorem and the 
Intermediate Value theorem as corollaries. 


EXERCISES 12.5 


1. Suppose f [a, b] — R is a continuous function whose range includes only 
rational numbers. Show that fis constant. 


2. Prove Theorem 12.7 using the definition of connectedness. 


3. Explain why the Extreme Value theorem and the Intermediate Value theorem 
are corollaries to Theorem 12.8. 


12.6 ACOMMENT ON CALCULUS 


All of elementary calculus follows from the Extreme Value theorem and the 
Intermediate Value theorem, and so in a very real sense Theorem 12.8 embodies 
the entire theoretical content of elementary calculus. If you look through a 
calculus text for the results that are said to be "beyond the scope of this course," 
you will find theorems that you can prove (now) using Theorem 12.8. The 
Intermediate Value theorem is found mainly in results involving approximations 
to solutions of equations. As we've seen, it is used both to guarantee that 
solutions to certain equations exist and, in computations like Newton's method, 
to specify where a solution might be. The consequences of the Extreme Value 
theorem are more far-reaching: 


EXTREME VALUE THEOREM 

4} 

ROLLE’S THEOREM 
4 

MEAN VALUE THEOREM 
4 4} 
TAYLOR’S THEOREM FUNDAMENTAL THEOREM 
For further contrast between the real and rational numbers, we show: 


THEOREM 12.9: The only connected subsets of the rational numbers are sets 
with a single element. 


PROOF: We know that sets with one element are connected. Suppose that 
S € Q and that a and b are distinct elements of S. Let z be an irrational number 


between a and b (by Corollary 6.2.d). Then A = (—oo, z) and B = (z, ) are a 
disconnection of S. = 


A set with the property that its only connected subsets are those with one 
element is called, appropriately enough, totally disconnected. Finally, a result 
promised some time ago: 


THEOREM 12.10: The only subsets of the real line that are both open and 
closed are the empty set and the whole line. 


PROOF: Suppose 4 is another set that is both open and closed. Since A is 
closed, C(A) is open. Since A is neither empty nor the whole line, C(A) is neither 
empty nor the whole line. Then 4 and C(A) form a disconnection of the real line, 
a contradiction. = 


EXERCISES 12.6 


1. Show that the Mean Value theorem fails in the rational numbers. 


2. (a) Use the Mean Value theorem to show that any function that is the 
derivative of another function has the Intermediate Value property (that is, if 
fis everywhere differentiable, then /’ has the Intermediate Value property). 


(b) Let f(x) = x? sin(1/x) for x # 0 and (0) = 0. Show that fis differentiable 
everywhere (the only place there is any doubt is at x = 0). 


(c) Show that / is not continuous at 0 (in (b) you found /(0), and you can 
easily find the formula for /’ everywhere else). 


(d) Conclude that there are discontinuous functions with the Intermediate 
Value property. 


3. Show directly that the connectedness of an ordered field implies it has the 
Archimedean property. (In this case, "directly" means don't use this 
argument: connectedness = Least Upper Bound property = Archimedean 
property.) 


4. We have seen that three of the main parts of the Big Theorem 
(connectedness, the Least Upper Bound property, and the Heine-Borel 
theorem) imply the Archimedean property. The other three do not. Is there an 
underlying difference between these two groups of results? 


5. In this problem, give two solutions each to (a) and (b). In your first answer, 
use the fact that connected subsets of the real line are intervals. In your 
second answer, use either the definition of connectedness or the topological 
characterization given by Theorem 12.3. 


(a) If S is a connected set, show that S’ (the closure of S) is connected. 


(b) If S is a connected set and T is such that $©TCS~, show that T is 
connected. 


(c) Suppose A and B are connected sets with 4 © # and C is such that 
ACCC B, Is C necessarily connected? 


6. (a) Suppose S is a nonempty open set that isn't the whole real line. Show that 
there is a cluster point of S that is an element of C(S). Give an example of a 
set with just one such point. 


(b) Suppose S is a nonempty closed set that isn't the whole real line. Show 
that S must contain a point of which it is not a neighborhood. Give an 
example of a set with just one such point. (Go back now and do Exercise 
9.6.4 again.) 


7. TO CELEBRATE THE COMPLETION OF PART 2: Pick any two parts of 
the Big Theorem and prove directly that one implies the other (you may have 
to juggle the Archimedean property). For instance, you might show that 
"Any ordered field having the Least Upper Bound property is Archimedean 
and satisfies the Bolzano-Weierstrass theorem." Some of these connections 
are very difficult, some are easier. There are 30 from which to choose. We've 
done nine of them, and a couple others have appears as exercises. 


! An advanced theorem in topology says that if fis a continuous function defined on the surface of a 


sphere with its outputs in R’, there are diametrically opposed points where f takes on the same value. In 
other words, at any give moment, there are two diametrically opposed points on the surface of the earth 
having the same temperature and barometric pressure. In order to interpret the theorem in this way, we must 
make some assumptions about temperature and pressure. 


Part Three 


Topics from Calculus 


We now know something about how the real number system works and how it 
differs from the rational number system. In the next section of the book, we put 
our knowledge to work in a survey of topics from calculus. These chapters are 
not intended as an exhaustive review of calculus. We will not discuss the 
physical applications that make up such an important part of an introductory 
calculus course. But calculus is, after all, applied real analysis. We approach 
these topics in such a way as to emphasize what they can tell us about the 
connections between theory and applications. Applications, in turn, help us to 
understand the theory better and give us an appreciation of why people might 
have thought about things in the way they did. 


A calculus course usually consists of an intuitive discussion of topics like 
these (with only occasional proof), along with some historical background 
(though the latter is often carefully concealed). Here, our main goals are to 
provide the proofs that are omitted or glossed over in such a course, and examine 
the theoretical background of the subject. You will notice that our approach to 
these proofs is somewhat different from the one we took in Part 2 of the book. 
Most calculus proofs are in essence existence questions. As such, they don't lend 
themselves well to the forward-backward method. We will attack them more 
directly. Ideally, you should always try to work out these proofs before you read 
the finished versions. Practically speaking, you should study the proofs in the 
rest of the book by asking why each step is done when it is, by trying to 
anticipate the steps as much as possible, and by checking to make sure that all 
the pieces are in their proper places when the proof is claimed to be complete. 


Some of the discussions in this part of the book will be familiar, others will 
not. Some concern techniques more than theory, while others advance the theory 
in directions we might not have considered before. Since we already know pretty 
much what calculus is about and have a general idea of how the subject fits 
together, we don't need to spend much time motivating the study of these ideas. 


Chapter 13 


Series 


13.1 WE BEGIN ON A CAUTIOUS NOTE 


Suppose we begin with a sequence, say (a,), and construct from it another 
sequence, (s,,), by letting s, = a, + a, + ... + a,. We considered this process 
briefly in Exercise 10.2.9, where we saw that the behavior of one sequence like 
(s,,) Can sometimes be deduced from the behavior of another. From this simple 
idea, we will develop some very powerful tools. 


DEFINITION 13.1: A series is a sequence constructed in the manner of (s,) 


above. The terms of (a,,) are also called the terms of the series, while the terms 


of (s,,) are called the partial sums of the series. ! 


THREE NOTES OF CAUTION: 1) It is very important to remember that a 
series is a special sort of sequence. A series and its limit are both denoted din=l 
a, or just 2.x, The former notation will be used when the value of the limit of 


the series is of some importance, the latter when it is not (which is usually the 
case in our work). Theorem 13.3 will tell us that, if our only concern is 
convergence or divergence, we can ignore where the index begins. This notation 
can be misleading since we refer to a series, before we know whether it 
converges, with the same symbols we use to denote its limit. 


2) The resemblance of a series to an "infinite sum" is a mixed blessing. Series 
and sums, as we will see, can behave very differently. We will use "sum" to 
mean "finite sum" and avoid the phrase "infinite sum" as much as possible. On 
the other hand, we often refer to the limit of a series as its "sum" and write 
Yan =S instead of #224 = 5. We also often use the suggestive "S" rather 
than "ZL" for this limit. Since this material is familiar from calculus, and since 
we've already done the difficult part of the work in our study of sequences, this 
abuse of notation should not 


cause serious difficulty. 


3) The individual outputs of the function that makes up a sequence are 
traditionally referred to as "terms." Likewise, numbers being added in a sum are 
called "terms." But the terms of a series (as defined above) and the terms of the 
sequence of partial sums (which is what the series really is) are very different 
things. This double use of the word "term" may be the root of much of the 
confusion between series and sequences, some of which you undoubtedly felt 
yourself in your calculus class. 


13.2 BASIC CONVERGENCE THEOREMS 


We begin our study by restating the main results of Chapters 9 and 10 in the 
language of series. These are for the most part corollaries to results in those 
chapters. Those not proved here should be taken as exercises. 


DEFINITION 13.2: The series 2, with sequence of partial sums (s,), 
converges to S if, for every ¢ > 0, there is an Ve © N so that |s, — S| < ¢€ 
whenever n > N,,.. 


Changing or deleting finitely many terms of a sequence has no effect on its limit, 
but we must be a bit more careful with series: 


THEOREM 13.3: (a) Jf k and m are such that all terms in both series are 
defined, then n= %™ converges if and only if X2n=m converges. 

(b) Changing (or deleting) finitely many terms does not change whether a series 
converges, but may change its limit. 


PROOF: (a) We may suppose k > m. Note that the partial sums of din=k ” differ 
from those of 22n=m @ by a, +... + dp. The result follows from Theorem 9.11. 


(b) Suppose Dace @n and 2 


-C 
=j T 


are such that c, = a, for n = ng. The result is 
: . mile, * v,.o : 00 500 , 

obtained by applying (a) to in=k On Dinmj Cn Uin=no Qn and un=no °n, » 
THEOREM 13.4: Jf dnel @n converges to S, n-1 » converges to T, and c is a 
real number, then n=1(n + 6») converges to S + T and Xn=1» converges to 
cS. = 

More is being said here than meets the eye. If these were finite sums, the first 
statement would just be the ordinary commutative property, but the series 
indicated by (a, +a, +...) + (b; + by + ...) can't be obtained from a, + b; + ay + 


b, +... Just by rearranging terms. 


The question of what happens when we multiply series is much more 


complicated. (It is not even clear how we should go about doing it.) We will deal 
with multiplication of series in Section 13.11. 


EXERCISES 13.2 


1. 


Explain why (a; + a) +...) + (b; + by + ...) can't be obtained from a, + b, + 
ad, +b,+ ... just by rearranging terms. 


. Suppose that a, > 0 for all n and let &» = “““*>—“*. Show that Lln 
diverges. 
. (a) Show that the insertion of parentheses into a convergent series does not 


: i oe Ce: oe or 

change its convergence or limit. For example, P+otatet might 
f 1 \ 1 {1 1 \ 

become ‘! * 2)+4*‘s* ig) +". Let us agree that each set of parentheses 


can enclose only finitely many terms, and there is no nesting. 


(b) Show that the insertion of parentheses into a divergent series can cause it 
to converge. 


(c) Suppose that the terms enclosed in any pair of parentheses all have the 
same sign. Show that resulting series converges if and only if the original 
series does. 


. Suppose S is an uncountable set of positive real numbers. Show that, for any 


real number B, there is a finite subset of S whose sum is more than B. (Hint: 
Think about the cluster points of such a set.) 


. The previous exercise is an important result. It may be rephrased by saying 


that if an "uncountable" sum converges, it must be the case that all but 
countably many of its terms are zero. But aren't integrals just convergent 
"uncountable sums"? Explain. 


13.3 SERIES WITH POSITIVE TERMS 


Except for a possible different sign in the first term, the sequence (s,) is 


increasing if a, => 0 for all n and is decreasing if a, < 0 for all n. The Monotone 


Convergence theorem may be restated: 


THEOREM 13.5: Excepting the first term, if a, = 0 for all n (or a, < 0 for all 
n), then 2. % converges if and only if (s,) is bounded. = 


ei 


EXAMPLES 13.3: 1. Tenis) converges. It is easily seen by an induction that 
See ht ite tg =l- 2, and so s, < 1 for all n. By Corollary 10.4, 
Bea (3)" sup 8, = 1 


This is an example of a geometric series, which are among the few series whose 
limits we can actually compute. Lets, =1+r+77+...+7r',andsors,=rt+/r* 


+... ¢ rl, Then s, — rs, = 1-71, ands, =(1 —r")/( — r). From this we 
can see that n=." converges if and only if \r| < 1, and if so, the limit is 1/(1 — 
r). This observation is called the Geometric Series Test. 
2: Ee diverges. This is called the Harmonic Series.” Here we will compare the 
harmonic series with another series that we construct term by term: 

1 + +e+2 44 4+ 2 + 2 +--- 


6 ‘ 8 


| | 
+ b+ i + B+(5 +>: 
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Each term in the harmonic series is at least as large as the one below it. The 
terms in the bottom series can be gathered into groups whose sum is 1/2, and so 
it diverges (its partial sums are unbounded by the Archimedean property). The 
partial sums of the harmonic series also are unbounded, and so it diverges. 


EXERCISES 13.3 
1. We saw that n-o"" = 75. Find a formula for 2n-«"", where k + 0. 


2. Give an example of a positive series of rational numbers with bounded 
partial sums that does not have a rational limit. 


3. (a) Let (a,,) and (b,) be sequences where the terms of (5, are the same as 
those of (a,) except that some might be repeated finitely many times. For 
example, (a,,) might begin 1, 2, 3,..., while (6,) might begin 1, 1, 2, 2, 2, 3, 
3, .... Show that (a,,) converges if and only if (5,) converges, and if so their 
limits are the same. 


(b) Let 32 ¢n and X @n be series in which the terms of 2 @n are the same as the 


nonzero terms of 2. ©» (for instance, if the first few terms of 2. © are 1+0+2 
+0+3+..., the first few terms of +“ would be 1 +2+3+ ...). Show that 


~ 


-“ converges if and only if 24» converges, and if so their limits are the 
same. 

(c) Find all places in the chapter (before and after this exercise) where this 
result is assumed. 


13.4 SERIES AND THE CAUCHY CRITERION 


If (s,,) is the sequence of partial sums of a series 2. @» and n > m, then s,, — 8, = 
Ay + Ay-1 +... + Api. With a small adjustment to make it easier to read, the 


Cauchy criterion for series may be stated as follows. 


THEOREM 13.6: The series *<“" converges if and only if, for any ¢ > 0, there 
isan N EN so that |a,, + ay4, +... + a,|<€ whenevern=m> N. a 


EXAMPLES 13.4: 1. Theorem 13.6 and the argument of the previous example 
give us another proof that » !/" diverges. We need only note that 


Qe-1 " ok-1 44" "OR" Ok 


For any N, there is a group of terms with indices larger than N that add up to 
more than 1/2. The Cauchy criterion fails for any ¢ < 1/2; the harmonic series 
diverges. 


COROLLARY 13.7: /f > % converges, then lim a, = 0. 


PROOF: Apply Theorem 13.6 with m =n. = 


Corollary 13.7 is called the mth Term Test, which is usually stated in the 


ns Sa. > : . 
contrapositive: Jf lim a, #0, then “-“" diverges. The harmonic series reminds us 
that the converse of Corollary 13.7 is not true. 


13.5 COMPARISON TESTS 


Our goal is usually to decide whether a series converges rather than to find its 
limit if it does. Among the tools that make this possible is a collection of results 
referred to as "convergence tests." These fall into two general categories: 
comparison tests and internal tests. We have already seen two internal tests (the 
nth Term test and the Geometric Series test) and one comparison test (Exercise 
10.2.9, which is usually called simply the Comparison Test). We restate the 
latter here in the terminology of series. We say a series is positive if each of its 
terms is positive. 


THEOREM 13.8: Let >. @n and &. bn be positive series with a, <b, for all n. If 


» bn converges, so does X %; if 22» diverges, so does >)». = 


As noted in Exercise 10.2.9, these are the only conclusions that can be drawn 
from these assumptions. In view of Theorem 13.3, the comparison of a, and 5, 


need only hold eventually. 


I ; E26 21 ~ 
EXAMPLES 13.5: 1. © 37 converges since 2 ~ 2" for all n and La 
converges. 
Pi | : F In{n) : 
2, 2 Tay diverges. From calculus, we know that !'™ ;77s* = 9. This means that, 
In(n - I] a. : * 
for n sufficiently large, mt <1 and Tam > a (though it might not be 


immediately obvious just how large is "sufficiently large" in this case). 


It is often very difficult to meet the rather demanding hypotheses of the 
Comparison test (from some point on, every pair of terms has to line up 
properly). A comparison test that could indicate that two series are generally 
similar might be very useful. 


THEOREM 13.9: (The Limit Comparison Test) Let 2 @n and >» be positive 
series and suppose lim(a,/b,) =L > 0. Then ¢ and Xn either both converge 


or both diverge. 


PROOF: It must eventually be true that 0 < (L/2)b, < a, < 2Lb,, and the result 


LS en 


follows from the Comparison test, the comment following it, and Theorem 9.11. 
i | 


You will investigate in Exercise 13.5.1 whether anything can be salvaged from 
the Limit Comparison test when L = 0 or if the sequence of ratios fails to 
converge. 


arctan n 


EXAMPLES 13.5: 3. >" diverges. Comparing this to the harmonic series 
using the Limit Comparison test gives a limit of 2/2 or 2/z. (Since the Limit 
Comparison test requires only that L # 0, it doesn't matter which terms are put in 
the numerator.) We know that the harmonic series diverges, consequently this 
series diverges. 


4. It is difficult to say anything precise about the terms of a series like 
a 14n° — 5n? + 17n — 2 
6n4 + 3n?2 —2n+ 11’ 
but we can compare it to the harmonic series using the Limit Comparison test. 
(Why have we chosen the harmonic series to do this comparison?) The limit 


would be 14/6 or 6/14, and so this series diverges. (The fact that this limit is 
positive guarantees that the terms in this series are eventually positive.) 


EXERCISES 13.5 


There are very few exercises in this book that say "Show that this series 
converges." You should practice using the convergence tests by finding several 
calculus texts and doing the exercises on convergence in them. 


1.Discuss the conclusions, if any, that can be drawn if the limit in the Limit 
Comparison test is 0 or +00. 


a 

soem ae ; lim—— =L#0 
2.Suppose that +. “» is a positive series and that —1/n” : 

; tyes ‘i i, S* 1/nP 

That is, the Limit Comparison test has worked for @» and ~ ™ (but 
notice that, except for p = 1, we don't know yet whether such a series 
converges or diverges). Show that the Limit Comparison test will fail if dan 
is compared to any series » !/" with g <p. 


2 = 2% : ‘ ‘ 
3.(a) Let *-“" and & ’n be positive series with sequences of partial sums (s,,) and 


; > eee 
(t,), respectively. If s, < ¢, for all n, show that convergence of — bn implies 


convergence of  “» and that divergence of ~°" implies divergence of ©». 
Explain how this statement differs from the Comparison test. 


(b) Show that the Comparison test follows from the result in (a). 
(c) Give an example in which this test works but the Comparison test fails. 


(d) Show that the condition in (a) that the series are positive can't be dropped. 


13.6 THE INTEGRAL TEST 


Here is a test in which we compare a series with an integral. Recall that we say 
"OO gy oy P . ‘ Oy as \ . 

that the integral Jm /() 4” converges if lio Jn f(") 4 exists, 

THEOREM 13.10: (The Integral Test) Suppose that f(x) is a_ positive, 
decreasing function whose domain contains some ray [m, ©). Then the series 
Len=m f(") converges if and only if the integral Jm £(*) 4” converges. 
PROOF: This is one of those rare proofs that can be seen all at once by drawing 
just the right pictures: 


The rectangles in the picture on the left have areas f(1), f{(2), .... We see that the 
partial sums of the series are larger than the corresponding "partial integrals" of 
the integral. If the series converges, its partial sums are bounded above, and so 
are the partial integrals. The integral converges by Exercise 10.2.8. The 
rectangles in the second picture are smaller than the corresponding sections of 
the area under the curve. The areas of these rectangles are f(2), f(3), .... The 
same sort of argument shows that the convergence of the integral implies that of 
the sum. = 


Observe that while the integral test says the series converges if and only if the 
integral does, they very likely don't converge to the same value. 


EXAMPLES 13.6: 1. 2 !/”” is called a p-series. We will examine its behavior 


for all values of p. We know this series diverges if p = 1, and if p < 0, the terms 
of the series do not approach 0. If p > 0 and p # 1, the function f(x) = 1/2 is 
positive and decreasing and 


nt 
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The latter limit exists when p > 1 but not when p < 1. Thus © 1/”” converges if p 
> 1 and diverges if p < 1. 


2. The result in the previous example is called the p-test. The p-test can be used 


along with the Limit Comparison test to decide the behavior of series defined by 
an algebraic function. For instance, the series 


= ¥/4n2 + 19n + 31 
V9n? + 13n4 +1 


: bese : : : 1 
can be compared using the Limit Comparison test with the series dats, The 
series L W778 converges by the p-test, so this series converges. 
——_— 
3. The series ~.'"™)* can be seen by the Integral test to converge when p > 1 


and diverge when p < 1. This illustrates the hazards of using a calculator to 
examine questions of convergence. If we had p=. cle doha eka lel it 


ye ie 
would be very difficult to distinguish ~ atta _ from —* atta) with a calculator 
(no matter how accurate your calculator might be). Nevertheless, the former 
series converges and the latter diverges. 


EXERCISES 13.6 


1. (a) Show that even if the series and the integral in the Integral test both 
converge, they don't necessarily converge to the same thing. 


(b) Is there any circumstance under which the series and the integral would 
converge to the same thing? 


(c) Examine the pictures used in the proof of the Integral test to find 
estimates of the value of the integral versus the limit of the series. (Is the 
integral always greater than the series? Less? By how much?) 


2. (a) Show that the hypothesis that f(x) is decreasing can't be omitted from the 
Integral test. 


(b) The functions in the pictures used to prove the Integral test are both 
concave up. Is this necessary? 


3. Provide the missing details in the proof of Theorem 13.10 (for instance, 
examine the "partial integrals"). Check that Exercise 10.2.8 has been used 
correctly. 


4. The apparent precision of the p-test might lead us to believe that there is a 
sharp distinction between the way the terms in a convergent series go to 0 
and the way the terms in a divergent series go to 0. In this exercise, we see 
that this is not the case. 


(a) Let a, = 1/n and b, = 1/(n In(n)). Recall that >2@n diverges. Show that 


da bn diverges and that lim(b,/a,) = 0. (Observe that b, goes to 0 faster than 
a, but Xn still diverges.) 


(b) Let a, = 1/n? and b,, = In(n)/n*. Recall that > 4n converges. Show that 


x ‘ : 
22 bn converges and lim(a,/b,) = 0. (In this case b, goes to 0 slower than a,, 
but >» still converges.) 


Sd. = ; ; 
(c) Suppose ““" is a convergent, positive series. If B > 0 and ¢ > 0 are given, 
show that there is a natural number N so that Ba, + ... + Ba,, < ¢ whenever n 


>m > N. 


(d) Suppose didn is a divergent, positive series. If B > 0, N € N, ande>0 
are given, show that there are natural numbers n > m > N so that ea, +... + 
éd,, > B. 


(e) Carefully state and prove a theorem to the effect that For any divergent 
series, there is a divergent series whose terms go to 0 faster; for any 
convergent series, there is a convergent series whose terms go to 0 slower. 
("Faster" and "slower" have essentially been defined in this exercise.) 


13.7 THE RATIO TEST 


The great drawback of comparison tests is that they require us to find a 
comparable series (or integral) before they can be used. This means in essence 
that we have to have a good idea what the answer to the question is before we 
can start. If the behavior of a series escapes our intuition, we may have no 


indication how to begin. It might be preferable to have tests that can be applied 
directly to the series in question, without reference to other series. The most 
familiar of these is the Ratio Test. 


THEOREM 13.11: Suppose ©". is a positive series and lim(a,.;/a,,) exists 
(say it is equal to L). Then 2% converges if L <1 and diverges if L>1. If L =, 
the test gives no information. 


PROOF: We may dispose of the final comment by noting that L = 1 for any p- 
series. Some p-series converge, some diverge. Suppose L > 1. Note that (1, 0) is 


a neighborhood of L, and let N be such that @n+1/@n © (1,00) for n > N. Forn > 
N, we have a, > a_; >... > ay. Since ay > 0, these terms do not approach 0, and 
the series diverges by the nth Term test. Now suppose L < 1 and let 7 be such that 
L<r<1,and WN such that 0 <a,,,/a, <r for n > N. Then for n > N we have a, 
<r” Nay (the pattern can be seen by looking at ay, ), dx, -..), and the proof is 


completed by using the Comparison test with 2/4» and the geometric series 
> a n/ rN \r”, a = 


The theoretical power of the Ratio test (and of all convergence tests) lies in the 
Comparison test. This should not be surprising since the Comparison test is the 
bridge between series and the completeness of the real numbers. 


EXAMPLES 13.7; 1. ° > 100"/n! converges since the limit in the Ratio test is 0 
(which is less than 1). The Ratio test is especially useful in dealing with series 
whose terms involve exponentials and factorials. 


-  S*nt/n" ieee & ; : 
2. The series “"'" converges. The limit in the Ratio test is l/e < 1 (a 
calculation that requires I'H6pital's Rule). 


The Ratio test is essentially a Limit Comparison test with a geometric series 
whose common ratio r we don't know before the test begins. This 1s why the 
ratio test doesn't work on p-series. A p-series doesn't resemble any geometric 
series in the sense of the Limit Comparison test. 


EXERCISES 13.7 


1. For what, if any, values of p does the series indicated by 


converge? 


2. (a) Prove the following strengthening of the Ratio test: Let 24» be be a 
positive series. If lim sup, ,.(Gy+1/4,) < 1, then 22% converges. If lim 
inf, ,.(4,+1/d,) > 1, then X & diverges. 

(b) Why does this constitute a strengthening of the Ratio test? 


(c) State and prove a lim sup-lim inf version of the Limit Comparison test. 


13.8 TWO MORE TESTS FOR POSITIVE SERIES 


The Ratio test is the best-known internal convergence test, but there are others, 
each useful in its own way. We will state two. The first can sometimes be used to 
decide convergence of a series when the Ratio test has failed. Notice that the 
limit in this test exists only if lim a,,,/a,, = 1 (that is, if the Ratio test has failed), 


and that, even so, it must eventually be the case that a,/a,,, > 1, otherwise the 
terms of the series do not go to 0. 


THEOREM 13.12: (Raabe's Test) Let 2. @» a positive series and suppose that 


lim ( ( = i) = §. 
An tl | 
If L > 1, then = %% converges; if L <1, then >. diverges. 


PROOF: We will prove the first statement. The rest is Exercise 13.8.1. Let r be 
such that | < 1 +r <L (this choice is made in hindsight to make the proof go 


more smoothly). Let N be such that Csr 1) 7 EST when n > N. 
Manipulating this inequality, we obtain na, — (nt+l)a,i; > ray, for n = N. 


Writing out these expressions: 


Nan —(N + l)an4y > ran+ 


(N + l)anay — (N + 2)an+2 > Tan +2 


Mans — (M +l)a M+] > raM+1 


(note that all these expressions are positive). Adding, we obtain 
Nan — (M + l)ayg41 > r(angi1 + @na2 +°°: + @af41). 


Since (M+ l)ajy,, > 0, we have Nay > r(ayy, + Ayy2 + ... + Ay1). Now r is 
fixed, and the left side of this inequality does not depend on M. Thus the partial 
sums of 2. @n are bounded, and the series converges. = 

EXAMPLES 13.8: 1. Raabe's test doesn't help us with the harmonic series, but 
it does correctly decide the behavior of p-series with p # 1, where the Ratio test 
fails. For example, if a, = 1/n?, the expression in Raabe's test simplifies to (2n + 
1V/n > 2. 


Our final convergence test is an internal/comparison test hybrid with a 
delightfully descriptive name. Like the comparison tests, the convergence of one 
series is decided by looking at another one, but the second series is generated 
internally. Its proof is like our first proof of the divergence of the harmonic 
series, which reminds us that sometimes a technique we create to solve a small 
problem might be more useful than we realize. 


THEOREM 13.13: (The Cauchy Condensation Test) /fa, = a,., > 0 for all n, 
then >. %n converges if and only if >. 2"%2" converges. 


PROOF: We will show the "if" direction; the rest is Exercise 13.8.2. Since (a,,) 
is decreasing, for any n we have 


Gan + Qgn41 +°°* + Ggn4i_y S 2" aan. 
Thus if the partial sums of the series >. 2"2" are bounded, so are those of  @», 
and if 2. 2"2" converges, so does X @n = 


EXAMPLES 13.8: 2. The Cauchy Condensation test provides an easy proof of 

the p-test. When we convert the series © !/"” in the manner of the Cauchy 
apn fli \ywe— 1 . : 

Condensation test, we get 22" (ane) = Le ara g geometric series that converges 

when p > | and diverges if p < 1. 


EXERCISES 13.8 


1. (a) Complete the proof of Theorem 13.12. 


(b) Show that ©» converges if lim inf In (ate; 1)| > | and diverges if lim 
In (2m, 1) <1 

sup lt \@n+ Le, 

2. Complete the proof of Theorem 13.13. 


; a 
3. (a) Use the Cauchy Condensation test to examine the series 2 mintay, 


(b) Use the Cauchy Condensation test to examine 2. Gln for pF. 


In(-2a—1)|/=2 
4. (a) Show that the condition l" C= | (in Raabe's test) implies that 
(na,,) is eventually decreasing. 


(b) Show that the condition "(na,,) is eventually decreasing" is not sufficient 
to guarantee convergence of Lo an, 


(c) Show that "lim na, = 0" is not sufficient to guarantee convergence of L @n 


(d) Suppose ¢ > 0. Show that "lim(n!)a,, = 0" does guarantee that >. 4% 
converges. 
(e) Suppose (¢,,) 1s a sequence of numbers such that there is an e > 0 with ¢,, > 


e for all n. Show that the condition “lim(n’**")an = 0” guarantees convergence 
of an, 
(f) Suppose (¢,,) 1s a sequence of positive numbers with lim ¢, = 0. Does the 


Bude ‘Hs. 1+e \ = ” be ‘4 
condition “lim(n'™’")an = 0" guarantee convergence of *“"’ Are there 


conditions that could be put on the sequence (¢,) that would change this 
result (either way)? 


5. Given a sequence (a,) with a, > 0 for all n. Let Van. 


(a) Show that  @n converges if p < 1 and diverges if p > 1. 


(b) Show that no conclusion can be drawn if p = 1. (This is called the Root 
Test. It will play an important role in Chapter 15.) 


13.9 ALTERNATING SERIES 


Our results so far have been applicable mainly to positive series. Now we will 


look at series having both positive and negative terms. In the simplest such 
situation, the terms of the series alternate signs. Not only does this make it easy 
to keep track of the signs, such series can be dealt with by a remarkably simple 
convergence test. 


DEFINITION 13.14: A series is alternating if it can be written in the form 
by ae j)ntl 


1)"@n or in the form X!— n, where a,, > 0.(°) 
THEOREM 13.15: (The Alternating Series Test) Ifa, > a,,., > 0 for all n and 


: / \r / \n-+1 
lim a, = 0, then X(—1)"@n and U(-1)"""an converge. 


PROOF: We will assume the first term in the series is positive. Here is a 
diagram showing the first few partial sums of such a series: 


(1) b+} (+) (+ j}-——{ +4 

ay a2 ai agrags a4 a aqzg?ras a4+@s5 ai- ag+a4 a) 
We see that s; > 53 > 55 ... and sy < sq <5¢ .... The intervals [s5,, s>,-;] form a 
nest of closed, bounded intervals, and the length of [s5,, s>,-)] 18 a>,, which 


{S} = M,,[S2n; $2n i] 


approaches 0. Let 
that ay, < ¢, we have |§2n:S2n-1 
follows that /(-D"*'an = 5.0 


and let e > 0 be given. If N is chosen so 
])€(S-¢,S+e€) for n > N. It 


|Son + S2N—1 


COROLLARY 13.16: [f (= 1)" op UCU" 


limit S, then |s,, — S| <a, for each n. = 


is an alternating series with 


(Observe that this is a corollary to the proof of Theorem 13.15, not to the result 
itself.) 


EXAMPLES 13.9: 1. “a converges by the Alternating Series test. 

2, ©) converges by the Alternating Series test. This is called the 
Alternating Harmonic Series. We can use Corollary 13.16 to find an 
approximation to its sum. Adding 200 terms gives us a sum of approximately 
0.690, which is within 1/200 of the actual limit. This seems to be a lot of work to 
do for very little accuracy. 

3, 2 > is not an alternating series, and so the test will not work. We can show 
that it converges anyway, using Theorem 13.6. Note that 


|sinn sinm| _ 1 ] 
») “t e + 9 > ~~ ) ¥ 
n? m? n? m? 


. ; 1 
which may be made as small as we wish because Lat converges. 


EXERCISES 13.9 


1. (a) Construct a proof of the Alternating Series test based on the Monotone 
Convergence theorem (rather than the Nested Intervals property). 


(b) Construct a proof of the Alternating Series test based on the Cauchy 
criterion. 


(c) Construct a proof of the Alternating Series test based on the Least Upper 
Bound property. 


(b) Construct a proof of the Alternating Series test based on the Bolzano- 
Weierstrass theorem. 


2. (a) Show that the numbers sin(1), sin(2), sin(3), ..., don't alternate signs. 
(b) Show that no more than four of these in a row can have the same sign. 


(c) Does it ever happen that four in a row do have the same sign? 


3. (a) Does the picture drawn in the proof of Theorem 13.15 support the claims 
made about it? That is, could the situation be different enough from the one 
shown to change any of the results? 


(b) Draw a picture that describes the Alternating Series test if the first term in 
the series 1s negative. 


4. Show that the condition a, > a,,, in Theorem 13.15 is necessary. (That is, 
construct an alternating series whose term go to zero that diverges.) 


5. (a) Show that the series indicated by 1+1/2—1/3—1/4+1/5+1/6-... converges. 
(This is the harmonic series with signs changing every second term.) 
(b) Suppose we insert the signs +, +, +, —,-, -, +, +, +, ..., in the harmonic 
series. Show that the resulting series converges. 


(c) Suppose we insert the signs +, —, —, +, —, —, ..., in the harmonic series. 
Show that the resulting series diverges. 


(d) Note that in parts (a) and (b), the number of + signs that appear before a 


given term in the series is always just about the same as the number of — 
signs. In part (c), though, the number of + signs is always about half the 
number of — signs. Consider the following (whose proof is beyond the scope 
of this book): Suppose that for each term of the harmonic series we flip a fair 
coin, inserting a + before the term if the coin comes up heads, and a — if it 
comes up tails. In this way, the number of + signs and the number of — signs 
will be, in the long run, about the same (this is what "fair" means). Does such 
a series converge? 


13.10 ABSOLUTE AND CONDITIONAL CONVERGENCE 


In Example 13.9.3, we decided the convergence of a series by observing that the 
series consisting of the absolute values of its terms converges. 


DEFINITION 13.17: The series » “* is said to converge absolutely (or to be 
absolutely convergent) if the series » |@n! converges. 


If a series has any negative terms, whether it is absolutely convergent depends 
upon the behavior of a different series (if a series has only finitely many negative 
terms. Theorem 13.3 shows that its convergence and that of its "absolute" series 
are equivalent). The previous example leads us to suspect: 


THEOREM 13.18: Jf > @» is absolutely convergent, it is convergent. 


PROOF: We need only notice that the argument used in Example 13.9.3 is very 
general: For n > m, we have ja, + ... + a,| < |a,| + ... + la,,|. If the series 


converges absolutely, the right side can be made as small as desired by Theorem 
13.6. Applying Theorem 13.6 to the left side gives us our result. = 


Students of calculus often find it hard to appreciate Theorem 13.18, since 
superficially it doesn't seem like much is being said. It is important to remember 
that, despite their similar names, convergence and absolute convergence are very 
different concepts. 

The alternating harmonic series (Example 13.9.2) converges by the 
Alternating Series test, but we have shown (repeatedly!) that it does not 
converge absolutely. Such a series is called conditionally convergent, for 
reasons that will be clear shortly. Nothing we've done so far would suggest that 
finding the limit of a series and simply adding terms are really different. The 
behavior of conditionally convergent series will show us just how different these 


processes can be. 


DEFINITION 13.19: Suppose z : N — N is one-to-one and onto. The series 
>. @n(n) is called a rearrangement of 2%». 


An obvious adjustment must be made in this definition if the index of the series 
does not begin with 1. The letter z is used here to remind us of "permutation." 


THEOREM 13.20: Jf 22% is absolutely convergent and 2s an(n) is a 
rearrangement of 22% then 2d 4m (n) also converges absolutely and 2 On(n) = 27 an 


PROOF: Let us denote ~%). by X26» and let A= 2 \@nl The sum of any 
collection of the terms |a,| is bounded above by A, and so 


Joi] +--+ + [bn 
= lan(1)| + +--+ [@ncnyl 
A 


Thus > n is absolutely convergent, and so it converges. Let the partial sums of 
>24n be (s,,) and the partial sums of 2» be (¢,,), and let S = an and T = bn, 
For a given ¢ > 0, choose No and N = No so that the following all hold: 
(i) |s, — S| < 6/4 for n > No; 
(i1) |t, — 7) < €/4 for n > No; 
(iii) all the terms of $e appear in fy; 


and (iv) all the terms of “No appear in sy. 


The first two conditions guarantee that any sum of a,'s or b,'s, all of whose 
subscripts are larger than No, is less than ¢/4 in absolute value. Then 


|S —T| 
= \(S — sw) +8n —(T —tn) —tn| 


|S — SN} 4 |T — tn | 4 Isn — tn|. 


Now sy — ty consists of a,'s and b,'s having subscripts larger than Np (all the 
terms with smaller subscripts cancel). Since N > No, this last expression is not 
larger than 


e/4+¢/4+ | > (leftover a,,’s)| + | }>(leftover 6,,’s)| < ¢, 
and so S = 7. = 


The reassuring behavior of absolutely convergent series is in sharp contrast to 
Theorem 13.22 below. First we need a lemma. Given a series ) @ (which we 
assume to have infinitely many positive and infinitely many negative terms), we 


An 


denote by > Pn the series consisting of only the positive terms of dan and by 


>! 4 the series consisting of only the negative terms of L @. 


LEMMA 13.21: (a) 2 @n converges absolutely if and only if both >» Pn and X24 
converge (and if this is so, 22 = 22 Pn + 224), 
(b) If X. @ converges conditionally, both 2 Pn and & 4% diverge. 


PROOF: (a) Suppose Lan converges absolutely. Let the partial sums of 22 @ be 
(s,,) and the partial sums of © Px be (u,,). If k,, is chosen so that all the terms of u,, 
appear in Sk, we have Un = |Un| < |ai|+---+]ax,| Sd0 lanl, and so L/ Pn converges. 
That 2% converges is shown similarly. That 24m = Pn+2/d ig left as 
Exercise 13.10.1. Now suppose >?» and © 4 both converge. The terms of Ldn 
are all negative, and so it converges absolutely (be sure you see why this is so). 
Any partial sum of >. |@n| is bounded by the sum of appropriately chosen partial 
sums of 22 Pn and & !%n\, all of which are bounded, so >. |¢n! converges. 

(b) There are three possibilities for 2: ?» and 24»: both converge, both diverge, 
or one converges while the other diverges. The first has been shown to force 
absolute convergence of 2/2», and so we need only establish that the last forces 
> @n to diverge. Suppose © Pn diverges and 224 = S. Any partial sum of & @» is 
larger than 


« « . ‘ ny S ~oY + . , he } qc’ 
[an appropriate partial sum of $7 pp} +S. 


Since the partial sums of 2?» are unbounded, so are those of 2», and 1 @n 
diverges. = 


THEOREM 13.22: Jf 3.» is conditionally convergent, it can be made by a 
suitable rearrangement to do any of the following 


(a) converge to any given real number; 
(b) diverge because its partial sums are unbounded above, below, or both; 
(c) diverge because its sequence of partial sums has two cluster points (which 


can be chosen arbitrarily). 


PROOF: We will show (a). This is a proof where too much precision only 
muddles the issues. You may supply the details if you are doubtful. Let L be a 
real number (suppose L > 0). Since »P»' diverges, it has a partial sum larger 
than L. Take the first such partial sum as the beginning of the rearrangement. 
Suppose this partial sum adds up to L + c. Since 4/4 diverges (and since its 
partial sums approach —co), it has a first partial sum that is less than —c. Use 


these terms as the next part of the rearrangement. 


Suppose the sum so far is L — d. Deleting the terms we have used from the 
beginning of + Pn does not cause it to converge, and so the remaining series has 
a partial sum exceeding d. Use these as the next part of the rearrangement, and 
so on. Observe: We never run out of p's or q's; we never use a p or ag more than 
once; each time we change from selecting p's to selecting q's we use at least one; 
and we make this change infinitely often. Because of all of this, our procedure 
yields a rearrangement of >/@». Furthermore, since p, and q,, both approach 0 


(because 2 @n converges), we "overshoot" L each time by distances that 
approach 0 (though these overshots may not go monotonically to 0). Thus our 
rearrangement converges to L. = 


EXERCISES 13.10 


1. Verify the assertions in the proof of Theorem 13.20 that: 
(a) The sum of any collection of the terms |a,| is bounded by A. 
(b) Any sum of a,'s or b,'s all of whose subscripts are larger than No is less 
than </4 in absolute value. 
(c) N= Np. 


2. Complete the proof of Lemma 13.21.a. 


3. (a) Draw a picture to illustrate the proof of Theorem 13.22.a. 
(b) Mimic the proof of Theorem 13.22.a to find the first 20 terms of a 
rearrangement of the alternating harmonic series that converges to 2. 

4. (a) Complete the proof of Theorem 13.22. 


(b) Show that a conditionally convergent series can be rearranged so that its 
set of sequential cluster points is an arbitrarily chosen finite set. 


(c) Show that a conditionally convergent series can be rearranged so that its 
set of sequential cluster points is N. 


(d) Is it possible to rearrange a conditionally convergent series so that its set 
of sequential cluster points is Q? 


(e) Is it possible to rearrange a conditionally convergent series so that its set 
of sequential cluster points is R? 


. (a) Suppose z : N — N is a one-to-one, onto function with the property that 
there is a natural number WN so that z(n) = n for n > N (that is, "only finitely 
many numbers get moved by z"). Show that a rearrangement by z does not 
change the convergence or limit of any series. 


(b) Ifz:N— N is a one-to-one, onto function so that z(n) # n for infinitely 
many values of n, is there necessarily a conditionally convergent series 
whose convergence is changed by rearrangement by z? 


(c) If your answer to (b) is no, consider whether there is a further condition 
on a rearrangement z that would guarantee that the convergence of some 
series is changed upon application of z. 


(d) If your answer to (b) is yes, consider whether there is a further condition 
on a rearrangement z that would guarantee that no series has its convergence 
changed upon application of z. 


. The definition of "subsequence" (Definition 9.17) looks something like the 
definition of "rearrangement." We might define a "subseries" by saying that 
>24nx is a Subseries of 2. @ if (4n.) is a subsequence of (a,). 


(a) If the series 2. @n converges, does any subseries necessarily converge? 


(b) Are there conditions on (a,,) that would yield a positive result in (a)? 


(c) Is a subseries necessarily a rearrangement? Is a rearrangement a 
subseries? 


(d) Explore possible analogues of Theorem 10.11. When can we draw 
conclusions about the convergence of a series from the behavior of 
subseries? 


(e) Is the Cauchy Condensation test a statement about subseries? 


(f) Remember that a series is a special sort of sequence. Is a subseries 
necessarily a subsequence (of the series)? 


13.11 CAUCHY PRODUCTS 


In this section we consider multiplication of series. Our goal should be to find a 


means of defining the product of two series in such a way that, if dan = A and 
dbn = B, then the product of Y4n and bn is AB. Why are we being so vague 


. . LB x = . zs. » 
about this? Isn't it clear enough that the product of ~°". and Xn is 22 @nbn?? We 
can try this out on series whose limits we know. 


“00 l oe 
We know that »=0 2" If we multiply this series by itself in the way we 
have guessed, we have 


, x 1 ns call 1 x 1 
6=2x2= b> *) x (3 =] = yD an = 4/3, 


n=0 


which is certainly not what we want. Upon reflection, though, we see that our 
guess at a definition of multiplication was silly. Consider series that begin ag + 


a,+...+a,and by) +b, +... + b, and the rest of whose terms are 0. The product 
of such series should be the same as the product of the sums: 


(ag + @, +++ + Gyn) (bo + 6) +--+ + dy) 


= (agbo) + (agby + aybo) + (agbe + a,b; + agbg) +--+ + andy. 


Notice the terms at the beginning of this sum. The two subscripts in each term in 
a group add up to the same number. The subscripts on the a's increase from 0 to 
this number, while those on the b's decrease from this number to 0. Note also 
that this sum of subscripts tells us the position of the group in the whole sum. 
Describing this pattern symbolically gives us: 


+) EEE) 


n=0 n=0 n=0 k=0 


ee x ) Shee 
This is called the Cauchy Product of >/n-0% and "=? ’" (We must make 
adjustments if the first subscripts in the series are not 0.) 


EXAMPLES 13.11: 1. Consider the previous example again. The Cauchy 


product of the series ven-o 2 with itself is 


& (Ga) 
- £ (2) 


n= k=0 


n+l 
_ .. “= 


n=U 


The last series is easily seen to converge by the Ratio test. We will use a trick to 
find its limit. Consider the following array: 


1 és . & 2 i 
a 4 s " 46 
1 1 1 1 
ns - 2S cee oe 
2 { 8 16 
1 1 1 
= + = + aR + 
4 8 


Each row of this array is a geometric series with r = 1/2. The limits of the rows 
are 2, 1, 1/2, 1/4, .... Adding these, we obtain a geometric series whose limit is 4 
(which is, we note, the hoped-for product). On the other hand, if we add the 
columns in this array first, and then add those results, we obtain 


er Pm e (ft —~n+l1 
eae els jee bee 


n=0 


Thus the limit of the Cauchy product of this series with itself is the product of 
the limit of the series with itself, as desired. On the other hand ... 


0 «6 (—1)" 


2) The series ~"=° vn+1 converges by the Alternating Series test. The Cauchy 
product of this series with itself is: 


a8 x = b k+nek 
ror 2 V(k+1)(n—k +1) 


~~ i 
= » 1) » Nk +1)\(n—k+1) 


n=0 k=0 Vv 


But now (k+1)(n-—k+1) 


= nt+1l+nk—k? 


so that 


2 asn— ox 


{—1 


Since 2 # 0, the Cauchy product of Yo Vert with itself diverges by the nth Term 
test. 


Evidently we do not yet know the whole story. In light of the previous section, 
though, one difference between the series in these two examples should jump out 
ee 


: enco, Cs ‘ ps hen) EO 
at us: The series 2-n=02" is absolutely convergent, while <—"=° yn+i is 
conditionally convergent. 


THEOREM 13.23: /f 2 @n converges absolutely and >.» converges, then the 
Cauchy product of 2% and >2» converges. If XL =A and bn = B, their 
Cauchy product converges to AB. 


PROOF: This proof is largely a bit of careful bookkeeping. We also see again 
the standard procedure of splitting a quantity into parts, to be dealt with 
separately. Let 4n = Lk=0 @% and Bn = Xk=o % and let R,, = b,, — B. Then we have 


N n 
S S Andy k 


n=0 k=0 


= {(apbo) + (apbi + ayb9) +--- + (agbw +...+anbo) 
= ajgBy +a,Bn-1+-::+anbo 

= ap(B+ Rn) +a;(B+ Rn_-1)+-:-+an(B Ro) 
= AyvB+(apRn +0,;Rn_1+--:+anRo). 


limy_, i )anby-~e =AB Q. i 
N00 £un=0 Lk= . Since lim Ay = A, and 


B € R (since © %» converges), we have that lim A,B = AB. All we need to show 

is that the expression in parentheses in the last line above goes to 0 as N > o. 
Let ¢ > 0 be given. Since ). ’» converges, we have lim Ry = 0, and so there is 

a number Np so that |R,| < ¢ whenever NV > No. For any such value of N we have: 


We want to show that 


lag Rw + a,Rn_1 +---+anRo| 
< lapRn +a, Rn_-1) +-::+0n_n, fn, | 
+lan—n,+1RNo-1 +°°: tan Rol 
< e(lao|+...+ |@n—No|) + |@n—Nno+1Rny-1 + +++ + anRo 
= bari ay| + R(lan—no41| +°+* + an), 


where R = max{|R| : j = 1, ..., No — 1}. The sum in the left-hand term is 
bounded because » @» converges absolutely. Since lim dy = 0, the expression in 
parentheses on the right can be made as small as we wish. (Note that the number 
of terms in the sum does not change as N — oo.) Thus we may make |apRy + 
a,Ry_, + ... + ayRo| small by making N large enough; that is, limy_,,.laghy + 
ayRy_| + ... + ayRo| = 0. = 


EXERCISES 13.11 


oe _ Sxoo atl 
1. The trick used to find the limit of the series ~"=° 2" can be used to show 


that the harmonic series diverges. We may write 


Dole bole 


(note that the numerators of the fractions are now 1, 2, 3,..., as they were in 


the other example). Starting here, use the idea in the example to show that 
the harmonic series diverges. 


2. Given a series 2 "n, define a sequence (a,,) by 


nx, +(n—l)mo+---+22,-14+ 2p 
a = ——— : _ - . 
i] 
(a) Discuss where this formula "came from." (Compare it with the formula in 
Exercise 9.2.9.) 


(b) If 22 2» = 5, show that lim C= s. 


(c) Show that it is possible for lim a, to exist when  *» diverges. (This is 


called Cesaro summation. There are more ways to find the limit of a series 
than just "adding them all up." Compare this further with Exercise 15.5.5.) 


3. Consider again the first example following the definition of the Cauchy 
product. Is there anything in this argument that might require further 
explanation? 


4. Is it possible for the Cauchy product of two series to converge to the product 
of their limits even if neither of them converges absolutely? 


5. A double-ended series is one of the form 22n——- @. Such a series is said to 
‘ oC a=] F ae 
converge if both Yin=0 4n and Lin=-co An converge. In this case, the limit of the 
series is the sum of these two values. 


(a) Show that the choice of "break point" doesn't matter: The double series 
converges if and only if both ©2—-.» and Xin-«@ converge for any two 
integers m and k. (But the sum of these two limits is in general the limit of 
the double-ended series only if k =m + 1.) 


P oC ° k = 
(b) Show that it does x On = L.. then lis es Pie Gn ° L 


(c) The result in (b) seems to provide an easier and more intuitive method of 
checking whether a double-ended series converges, but show that it is 
possible to have lit n,=—% @ exist without having 2n--- % converge. 

. ; .2k 
(d) Show that the results in (b) and (c) are true for !i™%—-%0 Lin=— 4 &n, 
(ec) Show that if f: N — N is any strictly increasing function, it is possible to 


. f(k) . . . OC 
have lit, oo L2,—-4 %% exist without having 22n=—> “n converge. 


6. (a) Show that every series of the form one 10" converges, where d,, can be 
any digit (0, 1, ..., 9). 


a 


(b) Show that any real number can be written N+ duns io", with the series as 
in (a) and WN an integer. (These exercises allow us to represent real numbers 
as decimal expansions using the theory of series rather than nested intervals.) 


(c) Repeat (a) and (b) using series having denominators of 2” (and using 
digits 0, 1) or 3” (with digits 0, 1, 2). 


x l 


; b Se on : 
(d) Show that any real number can be written <"=-° 1°", where only finitely 
many of the d,,'s with negative subscripts are nonzero. 


(e) Suppose that in part (a), we have d,, = 9 for all n. What is the limit? 


(f) Consider the possibility of noninteger number bases: Let b be a number 


larger than 1 but not an integer. Can every real number can be represented in 
ye dae 

the form <"=-* &’ What should the range of choices for the d,,'s be? Are 

there choices of b for which there are real numbers with two such 


representations? No such representation? 


7. Consider the infinite product TTyai 1 + ax |, where we assume that a, > 0. 
(The reason for the peculiar construction will be evident in a moment.) Such 
a product is said to converge if the sequence of partial products, 
Pn = [Tk-1(1 + ax), converges. 


(a) Show that the product converges if and only if }2@n \ converges. (We 
write the product as we do to make it easy to single out the a,'s.) 


(b) In order for a series to converge, its terms must approach 0, but for an 
infinite product to converge, its terms must approach 1. Explain. 


8. (a) Draw a circle of radius 1. Inscribe in it a square. Inscribe in the square 
another circle. Inscribe in this circle an octagon. Keep up this way, doubling 
the number of sides in the polygon at each step. Find the radii of the first few 
circles. Do the radii approach a positive limit? 


(b) Repeat (a), except instead of doubling the number of sides of the polygon 
at each step, just increase it by one (triangle, square, pentagon, hexagon, ...). 


| Even if the first index in (ay) is not 1, we will define s, to be the sum whose last term is a, (so the 


sum defining s,, might not have n terms). 
Li 

2 The numbers 1, 2! 3 ..., are referred to as "harmonics," a reference to the role they play in music. If 
one of two identical vibrating strings is held at a point 1/2 or 1/3 of the way along its length, a chord that 
"sounds good" is produced. Touching a string at other places simply tends to deaden it. This may account 
for the fascination of the Greeks with rational numbers, a fascination that gave birth to the subject we now 


call number theory. 


— 


wy AG 
3 Tf we say ~~ "is an alternating series," then the a,'s we have just written down are not the a,'s in 


this definition. When we are discussing alternating series, we will always assume that the series has been 
written in the form suggested in the definition, so that a, > 0 for all n. 


Chapter 14 


Uniform Continuity 


14.1 UNIFORM CONTINUITY 


Applications of analysis consist in large part of approximation procedures 
accompanied by estimates of the errors involved in their use.! We make the 
simplest sort of approximation when we use one value of a function as an 
estimate for another value. Here are two examples: 


EXAMPLES 14.1: 1. Let f(x) = sin(x). By the Mean Value theorem, if x and y 
are real numbers, there is a real number c between them so that 


| sin(a2) — sin(y)| 
= |f(x)— fly) 
= I(f'(c)|lz—y| 
= |cos(c)||z—-—y 


la — yl. 


If x and y are close together, then sin(x) and sin(y) are close together, in a sense 
made precise by this inequality. The inequality measures the error we suffer 
when we use the value of sin(x) as an estimate for the value of sin(y). Observe 
that the error depends on the distance between x and y (as we would expect) but 
not on where x and y are on the number line. The same estimate may be used 
everywhere. For this reason, the estimate is said to be uniform (this will be 
defined formally in a moment). We can't always get a uniform error estimate ... 


2. Let g(x) = x*. By the Mean Value theorem again, if x and y are real numbers, 
there is a real number c between them so that 


|x? — y?| 


|2c||a — yl. 


Observe that c (and the right side of the error estimate of which it is a part) must 
get larger as x and y get larger (since c 1s between x and y). The error incurred in 


approximating x* by y* seems to depend not only on the distance between x and 
y but on where they are on the number line. It appears that this error estimate is 
not uniform. 


The difference between these examples is one of the most subtle issues in 
calculus. It may not be immediately clear that a uniform estimate is inherently 
better than one that is not (we will soon see that uniform estimates are crucial in 
a number of calculus theorems, though). Recall that the function f: 4 — R is 
continuous everywhere if 


Va € AVe > 056 > 03((x € A and |x — al < 6) > | f(x) — f(a)| <e). 


The existentially quantified variable 6 appears after a and « in the list of 
variables, and so 6 may depend on both of the others. If we don't want 6 to 
change from place to place, we should not allow it to depend on a. We can 
accomplish this by changing the order of the variables in the list (but don't take 
the fact that this change is easy to make typographically to be an indication that 
the concept is not important): 


DEFINITION 14.1: The function /: A — R is said to be uniformly continuous 
if 
Ve > 056 > 035Va € A((x € A and |x — al < 6) > | f(x) — f(a)| < €). 


This is probably not something that should be remembered in symbolic form. 
Rather, we should remember it this way: 


A function is uniformly continuous if there is a choice of 6 in the definition of 
continuity that works for every point a © A, 


Unfortunately, the topological characterization of uniform continuity is not very 
useful to us. Uniform continuity is essentially a topic in hard analysis. 


The examples at the beginning of the chapter show that f(x) = sin(x) is 
uniformly continuous and that g(x) = x? is not. 


EXAMPLES 14.1: 3. Let A(x) = 1/ for x > 0. Given 0 < € < 1 and any 6 > 0, 
there is a number n so that |1/n — 1/(n + 1)| < 6 but |hA(1/n) — hd /(n + 1))| = |n - 
(n+ 1)|=1>e. Consequently / is not uniformly continuous. 


4. The function h(x) = 1/x is uniformly continuous if its domain is taken to be (1, 


2). This may be seen as in Example 14.1.1, since |h’(x)| < 1 on (1, 2). Notice that 
changing the domain has changed the result. Uniform continuity depends on 
both the function and the domain. 


A uniformly continuous function is also continuous (you will prove this in 
Exercise 14.1.2). Loosely speaking, this is a statement having the form "More > 
Less," which should not be surprising. But we know that a continuous function 
may not be uniformly continuous, so the statement "Continuous > Uniformly 
Continuous" is false. The latter is of the form "Less = More," which is 
something we don't find to be true very often. Our goal is to find additional 
conditions that will allow us to conclude that a continuous function is uniformly 
continuous. We want to supply the missing part of the statement: "Less and ??? 
= More." Much mathematical activity is devoted to just this sort of problem. 
The last two examples indicate that some condition on the domain of the 
function might be helpful here. The following is the most important theorem on 
the subject and the only one we will prove. 


THEOREM 14.2: Jf f: A — R is continuous and A is compact, then f is 
uniformly continuous on A. 


We will examine this proof at some length, not only to prove the result, but as an 
illustration of how hard analysis is often accomplished. The tools we have 
available (continuity and compactness) give us specific and limited information. 
We will make some observations about the structure of the proof and the tools 
we have and gradually come to see what those tools will do for us and how to 
make them do it. This is not the forward-backward method per se, but it is good 
practice in the assembly of a complicated proof. (What follow are some ideas. 
The proof comes later.) 


; The definition of uniform continuity (the conclusion we seek) is quantified 
VYed6. We know that the proof must begin: "Let ¢ > 0 be given," and end with the 
selection of a number 0. 


The € we are given (in the definition of uniform continuity) is a measure 
of the distance between outputs of f We can use it the same way in the definition 
of (ordinary) continuity. 


By the definition of continuity, there is, for each a € A, a number 0(a) so 
that |x) — f(a)| < ¢ when x € A and |x — a| < 6(a). (This is penerann all we get 


from the definition of continuity.) 


To bring compactness into the picture, we need to find an open cover of A. 
The definition of continuity gives us, for each a € A, an interval, (a — (a), a + 
0(a)). These intervals are an open cover of A. 


Having found an open cover of A, we should immediately find a finite 
subcover: There is a_ finite collection {a,, ..., a,} with 
ALS UU. ,(ai — O(a;), aj + d(a,;) ) 


_2- Considering how we obtained the intervals we've just found, we see that if 
x € (a; — 6(a;),a, + 6(a;)) NA, then | f(x) — f(a;)| <e, 


- The last inequality looks something like what we want. Each « € A is in 
one of the intervals (a; — 6(a,), a; + 6(a,)), and so one of these inequalities applies 
to each such x. But the inequality holds only if one of the inputs to the function / 
is one of the points a), ..., a,. Uniform continuity doesn't allow such a 


restriction. Perhaps we can just take two points, say x and y, both in the interval 
(a; — 0(a;), a; + 0(a;)). Then 


f(x) — fly) 
= |f(x) — fla:) + f(ai) — fly)| 
|f(x) — f(ai)| + |f (ai) — Fly)| 


This looks even more like what we want, but there is still a restriction on x and y 
that isn't of the form |x — y| < 6 (x and y have to be in the same one of these 
intervals.) 


Let us change our tack a bit. The finite collection of intervals we have 
found gives us a collection of 6's. These are distances measured in the domain of 
the function. We need just one 6. Let's say 6 = min{d(a,)}. What can we 
conclude if we ask only that |x — y| < 6? Unfortunately, we can have |x—y| < 6 
without having x and y in the same one of the intervals we found. If x and y 
aren't together in one of these intervals, there doesn't seem to be much we can 


say about |f(x) — fly)]. 


.2- We need to find a way to force the numbers x and y to be close enough 
together so that we can be sure they lie in the same interval. We used the 


distances obtained from the definition of continuity to produce the 6 in our last 
observation (which measures how far apart x and y are). Perhaps we need only 
pick a smaller value for 6. This is the last idea we will need, and we can 
assemble the proof. 


PROOF: Let ¢ > 0 be given. For each « «, let 6(a) be such that 
(x € A and |x — al < 6(a)) > | f(x) — f(a)| < €/2 


(the ¢/2 will take care of the 2¢ we found in the third-from-last ~). The 
collection of intervals {(a — 6(a)/2, a + 6(a)/2)} is an open cover of A (dividing 
by 2 here makes the distances on the x-axis smaller). Since A is compact, there is 
a finite set {a,, ..., a,,} of elements of A, with 


Ae fs (a, 6(a;)/2,a; + 6(a,)/2). 


Let 6 = min{o(a,)/2}. Suppose x and y are such that |x — y| < 6 and let a; be such 
that |v — a,| < 0(a,)/2 (there is such an a; for each y, since {(a; — 0(a,)/2, a; + 
o(a;)/2):i=1...,n} 1s a cover of A). Then 


lr — a,| 
z—y|+ly— ai| 
6+ é(a;)/2 


6(a;)/2 + 6(a;)/2. 


(The idea "if x is close enough to y and y is close enough to a;, then x is close 
enough to a," was the last piece of the puzzle to fall into place.) Since |x — a,| < 
o(a;) and |y — a,| < 0(a;), we have 


f(x) ~— f(y)| 
f(z) — f(ai)| + |f(ai) — Fly)| 


ef2+e/2 


The examples show that a function can be uniformly continuous on a domain 
that is not compact, and so Theorem 14.2 is not the whole story. Still, "fis 
continuous on a closed bounded interval" is such a common hypothesis in 
calculus that this result is still extremely useful. 


COROLLARY 14.3: Jf f : [a, b] — R is continuous, then it is uniformly 
continuous. = 


EXERCISES 14.1 


1. (a) Show that the function f(x) = mx + 6 is uniformly continuous for any 
value of m. 


(b) A straight line whose slope is not 0 is an unbounded function, and so (a) 
shows that a function need not be bounded to be uniformly continuous. 
Show, however, that a uniformly continuous function whose domain is a 
bounded set must be bounded. 


(c) Show that f(x) = x” is uniformly continuous if its domain is a bounded set 
(compact or not). 


(d) Show that the conclusion in (c) is true of any polynomial. 
2. Show that a uniformly continuous function is continuous. 


3. Discuss what happens if we remove the dependence of 6 on « in the 
definition of continuity. What sort of functions satisfy the statement: 


Va € Adé > 03Ve > O((x € A and |r—a} < 6) > | f(x)—f(a)| < €)? 


4. (a) Let AC R. Show that f: A — R is continuous if and only if the following 
holds: For any neighborhood of 0, say V, and any aé A, there is a 
neighborhood of 0, say U, y, such that 


f((a+Uav)NA)C fla)+V 


(remember that * + 5S = {x +s: 8 € S}). 

(b) Let A © R. Show that f: A — R is uniformly continuous if and only if the 
following holds. For any neighborhood of 0, say V, there is a neighborhood 
of 0, say Uy, such that f((a+Uv)NA) C fla)+V for all a€ A [note that U, 
can't depend on a]. (This is a topological characterization of uniform 
continuity. It's not entirely topological, though, since it contains a reference 
to addition.) 


5. (a) If f: R — R 1s everywhere differentiable and /(x) is bounded, show that 
is uniformly continuous. 


(b) Give an example of a uniformly continuous function that is not 
everywhere differentiable. (Uniform continuity does not imply 
differentiability.) 


(c) Give an example of a uniformly continuous, differentiable function with 
an unbounded derivative. 


. (a) Show that the function (x) = 1/x is uniformly continuous on any set 4 


such that 9 ¢ A” (47 is the closure of A—see Exercise 8.3.2). In particular, 
h(x) is uniformly continuous on R\(—A, h) for any h > 0. 


(b) If f is a continuous function, is there necessarily an open interval (or a set 
of open intervals) whose length is "small" and such that f is uniformly 
continuous on the complement of their union? (This is fairly easy.) 


(c) (This is extremely difficult.) Does the answer to (b) remain the same if the 
domain of fis assumed to be bounded? 


. (a) Suppose fis uniformly continuous on A U B. Show that it is uniformly 
continuous on A and on B. 


(b) Suppose A and B are sets, fis uniformly continuous on A and on B, and 
A‘ B#%. Show that fis uniformly continuous on AU B. 


(c) Show that the result in part (a) remains true if the union of two sets is 
replaced by a union of any collection of sets. 


(d) Give an example of a function that is uniformly continuous on each 
interval [—n, n], n € N but not uniformly continuous on R. [So the result in 
(b) is not true for arbitrary unions. | 


(e) Give an example of a function that is uniformly continuous on two 
intervals A and B but not on AUB. [The condition “A 8 ¥ % can't be 
dropped from part (b). Look for a simple answer! ] 


(f) What extra condition must be placed on the function in part (d) to 
guarantee that the function is uniformly continuous on R? 


. How can we be sure that the 6 in the proof of Theorem 14.2 is positive? 


. Investigate: "If every continuous function whose domain is S is uniformly 
continuous, then S is compact." (If this is true, "Continuous > Uniformly 
Continuous" could have been taken to be the definition of compactness. If 
you find that this is true, prove one result from Chapter 11 based on it.) 


10. Show that fis continuous at a point a if and only if for any ¢ > 0 and any 
number B > 0 there is a 6 > 0 so that |x — a| < 6 = |f{x) — f(a)| < Be. Find a 
place we could have used this. 


11. Fora function f : A > R, a € A, and ¢ > 0 given, let 


D(e,a) = sup{é: |x — a] < 6 > |f (xr) — f(a)| < €} 


(a) Show that fis continuous at a if and only if D(e, a) > 0 for all e> 0. 


(b) Show that fis uniformly continuous if and only if faces Ple,@) > 0, 


12. (a) Iff: R — R is uniformly continuous and (x,,) is a Cauchy sequence, 
show that (f(x,,)) 1s a Cauchy sequence. 


(b) Does this remain true if it is only assumed that fis continuous? 


13. (a) Give an example of a function whose domain is a bounded interval, 
where the function is continuous but not uniformly continuous. 


(b) Give an example of a function whose domain is a closed interval, where 
the function is continuous but not uniformly continuous. 


14. Is a composition of uniformly continuous functions necessarily uniformly 
continuous? What about sums? Products? Quotients? 


1 There are always errors, since we don't need to approximate things we can find exactly. A common 
misconception about applied mathematics is that the work is done when an approximating procedure has 
been found. But an estimate is useless without a statement about its accuracy. 


2 For the observant reader: The first example uses the fact that the derivative of sin(x) is bounded. You 
will show in Exercise 14.1.5 that this is sufficient to guarantee uniform continuity, but not necessary. We 


should not be so sure about the second example. There might be a uniform estimate that we have just not 
found. 


Chapter 15 


Sequences and Series of Functions 


15.1 POINTWISE CONVERGENCE 


To define convergence of a sequence of real numbers, we needed only to have a 
way to say when two of them were close together. Ideas of closeness are 
provided by both the topological and the metric structures of the real line 
(fortunately, if points are close in the metric sense they are also close in the 
topological sense, and vice versa). To extend the study of sequences to a study of 
series, we needed only to understand addition. It would seem, then, that we can 
construct a theory of sequences in any setting in which we have an idea of 
closeness, and a theory of series in any setting in which we also know how to 


add. The vector spaces R”, for example, fit this description (but sequences and 
series in R” have little to add to our knowledge at this time). 


Functions are more interesting, though. We certainly know how to add 
functions, but what does it mean for two of them to be close together? Consider 


a sequence of functions,! say (f,) (we assume all the functions in a sequence 
have the same domain). If x is an element of the domain, then (f,(x)) is a 


sequence of numbers, which might converge for some choices of x and diverge 
for others. This observation leads to the following definition: 


DEFINITION 15.1: The sequence (f,,) converges pointwise on the set S to the 
limit function fif lim, _,...f,(*) =x) for every x € S. 


Here we denote the limit process "lim, ,.," (we had just said "lim" when 


studying sequences of constants), because there are two variables present, and 
we will often want to consider limits involving x as well as n. 


EXAMPLES 15.1: 1. Let f,(x) = x”. We may take the common domain to be [0, 
1]. Then (f,) converges for all © © |9: 1l, If we let 


then f, — f pointwise on [0, 1]. Observe that f is discontinuous, even though 
each of the functions f, 1s continuous. 


2. Let f(x) be any unbounded function. Let f(x) have the same domain as f and 


be given by 
f(x) if |f(x)| <n 
Fal ) — 7 if f(z) > n 
—n if f(x) <—n., 


Then f, — f pointwise (because of the Archimedean property). Note that each 
function f, is bounded. If we take, for example, f(x) = 1/x, each f, is Riemann 


integrable, but f is not (this kind of construction is often used to define the 
integral of an unbounded function). 


3. Let f(z) =e" Here is part of the graph of f,o(x): 


“li _ | ae | 
8 9 10 11 12 


As n increases, this "bump" moves to the right. The value of f,(x) for a particular 


(large, positive) value of x begins near 0, increases to near 1, and then decreases, 
approaching 0 as n — oo. If x < 1, f(x) simply decreases to 0. For any x € R, 


lim, _,.(x) = 0, that is, f, converges pointwise to the function f(x) = 0. Note, 
however, that for any particular value of n, f,(n) = 1. (Roughly speaking, this 


means that there are always points on the graphs that are "far away" from where 
they will be in the limit.) 


4. Let f(x) = (1 + x/n)”. We know from calculus that lim,_,., f,(x) = e*. Below 
are the graphs of f;,(x) and e* drawn to the same scale. Since f, is a polynomial 
for each n and lim,_,., f,(x) = e* 1s an exponential function, we have lim,_,_,, 
lim, _,-~ f(x) = 0 while lim,_,_,, |f,x)| = © for all n. This is an extreme case of 


the phenomenon seen in Example 3. In this case there are always points on any 
of the graph y = f(x) that are extremely far from their limits. 


y¥ = fio(z) y=e 


ee = <= —— _ 


—20 —10 0 —20 —10 0 


5. The derivative of the function f(x) may be thought of as the pointwise limit of 
the sequence F(x) = n[f(x + 1/n) — fix)] (be sure you see why this is so). If we 


can find any property that must carry over from the functions in a sequence to 
the limit function, this property must hold for any function that is the derivative 
of another. Unfortunately, as these examples indicate, such properties are hard to 
come by. We will discuss some positive results along these lines in Chapter 18. 


EXERCISES 15.1 


1. Explain how the Archimedean property is used in Example 15.1.2. 


2. (a) Explain how the limit of the sequence F,, in Example 15.1.5 represents 
the derivative of f- 


(b) Draw some examples that illustrate this limit. If you have access to a 
computer-graphing program, you can use virtually any function for f; if you 
do not, try f(x) = x’, fx) = x°, and f(x) = e* 


15.2 UNIFORM CONVERGENCE 


We have seen that a sequence of continuous functions can converge to a 
discontinuous functions, a sequence of bounded functions can converge to an 
unbounded function, and a sequence of integrable functions can converge to a 
function that is not integrable. In Exercise 15.2.15, you will give an example of a 
sequence of differentiable functions whose limit is not differentiable. One of the 
most important topics in the study of sequences of functions is the question of 
when the properties of the functions in a sequence must carry over to the limit 
function. Evidence would seem to suggest that the answer is "Not very often." 
This is our own fault, in part, since we've settled for an easy definition of 
convergence. 


In Definition 15.1 a sequence of functions is taken to be no more than a 
collection of sequences of numbers. What we really need is an idea of what it 


means for two functions to be close together. Consider the first example again, 
where we saw that the sequence with f,(x) =x” converges pointwise for © © (0, 1] 


to the function f, where f(x) = 0 for © © [9:!) and f(1) = 1. This sequence also 
converges if we take the domain to be [0, 0.6], but the situations are very 
different. 


cee —<$— z | | a ——— | _ | 
' | | 


| 0.6 


On the left, as in Example 15.1.3, there are values of x (those very close to 1) 
where, for a given n, the value of f(x) is not close to its limit. In fact, for any n, 


we can find values of x < 1 so that x” is as close to 1 as we like (and so not close 
to 0). We can do this even though lim,,_,.,.f, = 0 for any such x. In the right hand 
picture, though, the values of x” are all closer to 0 than (0.6)”. If we choose n so 
that (0.6)” is small, we make all the values of x” close to their limits at the same 
time. In effect, the whole function x" is close to the limit function 0. 


DEFINITION 15.2: The sequence (f,) converges uniformly on the set S to the 
function f if, given ¢ > 0, there is a natural number WN, so that for all 
x € S,|fn(z) — f(x)| <© whenever n > N,,. 


Note that NV, can depend on ¢ (it would be surprising if it didn't) but does not 


depend on x (this is the same sort of change that transformed "continuity" into 
"uniform continuity"). In the pictures above, convergence is not uniform on the 
left but is uniform on the right. We sometimes abbreviate the expression "f, (x) 


converges uniformly to f(x)" by "f, — funiformly." The condition |f,(~) — fx)| < 
é in the definition may be rewritten: f(x) — ¢ < f,(x) < f(x) + «, and this is 
something we can draw (see below). The sequence (f,,) converges uniformly to / 


if the entire graph of f,, can be made to lie between the graphs of f(x) — ¢ and f(x) 
+ e by making n large. As a side benefit, we find that this idea of uniform 
closeness can be easily translated into a way of measuring the size of a function 
and the distance between two functions. You will show in Exercise 15.2.9 that 
the function in Definition 15.3 is really a norm (in the sense of linear algebra), as 
its name suggests. 


DEFINITION 15.3: The supremum norm (or uniform norm) of a function /: 
S — Ris given by ||flloo = SuPres |f(7)], 


Combining Definitions 15.2 and 15.3, we see that f, converges to f uniformly if 
and only if lim,_,., lf, — fl, = 0 (hence the name "uniform norm"). Consider 
again the sequence f,(x) = x”, and the limit function f defined in the Example 
15.1.1. On the interval [0, 1], we have If, — fll, = 1 for all n, while on [0, 0.6] we 
have lif, — fl, = (0.6)” — 0. We see again that the convergence is uniform in the 


latter case, while it is not in the former. (The notation for the supremum norm 
really should contain some reference to the domain of the function, but we will 
be able to tell this by context.) 

We seem to have found an idea of closeness that considers functions as whole 
objects. This was certainly our goal, but is the resulting idea of convergence any 
more useful than pointwise convergence? We have mentioned repeatedly that we 
might want useful properties of functions in a sequence to carry over to its limit. 
Often, uniform convergence is just what we need to achieve this. 


THEOREM 15.4: ff, is continuous on the set S for all n and f,, converges to f 
uniformly, then fis continuous on S. 


This proof is technical but can be seen easily with the picture below. The four 
dots near the lower left are all that concern us now. We want to show that the y- 
coordinates of the top two [which are f(x) and f(a)] can be made close together 
by making x and a close together. But now each of the top two dots can be made 
close to the one below it because the sequence of functions converges. This can 
be made to happen simultaneously because the convergence is uniform. Finally, 
the y-coordinates of the bottom dots can be made close together because f, is 


continuous for all m. Notice in this case how directly these thoughts can be 
translated into a precise argument. 


~y=f(r)t+e 
y= f(x) 


y= f(r)-e 


PROOF: Let ¢ > 0 be given. Let mp be such that || fno ~ flloo < £/3, (This is true 
for all but finitely many subscripts, but we only need one.) In particular, notice 


that |f(2) — fno(@)| < €/3 and |fno(@) ~ f(@)| < €/3, Since Sno is continuous, there is a 
5>0 so that fno(®) ~ fno(@)| < €/3 whenever |x — a| < 6. If |x — a| < 6, we have 
f(x) — f(a) 
< (f(x) — fro(x)| + |Fno(Z) — fno(@)| + |Ffno (a) — f(@)| 
“/3+e/3+e/3 
= «¢.§ 


This proof is another example of one of the most important techniques of hard 
analysis. The quantity that interests us—|f(x) — f(a)| in this case—is seen as a 
sum of other things, each of which we can estimate. It often happens (as it does 
here) that the information needed to estimate the different parts of a problem 
comes from different sources. When a useful estimate has been made of one of 
the pieces in such an argument, analysts say they have "controlled" that piece 
(perhaps a bit of wishful thinking). 


We can learn much about uniform convergence by thinking about things we 


don't like about the way the functions f(x) = x” approach their limit function f 
Here is another example. Let x, = 1 — 1/n. Then lim x, = 1, and so f(lim x,) =/(1) 
= 1. On the other hand, lim f,(x,,) = 1/e # 1. The examination of functions using 
sequences is very useful (see Theorem 9.16, for example). A situation like this, 
where sequences don't seem to do what we want, is inconvenient. Again, 
uniform convergence comes to the rescue. The proof of this theorem is very 
similar to the previous one and is left as Exercise 15.2.14. It is good practice in 
the "divide and conquer" technique. 


THEOREM 15.5: Jf f, is continuous on a set S for all n, f, converges to f 
uniformly on S, and x, is a sequence in S with lim%n =2€S, then lim f,(x,) = 


fix). = 


The final theorem in this section is a Cauchy criterion for sequences of 
functions. Sequences satisfying the hypothesis of this theorem are sometimes 
called "uniformly Cauchy."* This theorem establishes a version of the Cauchy 
criterion for the set of continuous functions, which means that the set of 
continuous functions is, in a sense, complete. Since the set of continuous 
functions is not linearly ordered, it doesn't make sense to talk about the Least 
Upper Bound property. Nevertheless, we are able to discuss the completeness of 
the set in terms of another part of the Big Theorem. Having six different 
characterizations of completeness allows us to pick the one that is the most 
useful. (Recall that some people take "Cauchy sequences converge" as the 
definition of completeness.) 


THEOREM 15.6: The sequence (f,,) converges uniformly if and only if, given € 
> 0, there is a natural number N, so that \if,, — fi. < € whenever m, n> N,. 


PROOF: The proof of the "only if" part of this theorem is precisely the same as 
that of Theorem 10.13, and we won't repeat it. Now suppose that (f,) 1s 


uniformly Cauchy. Since |f(x) — f,,(x)| < fh, — fill. for any x, the sequences 
(f,(x)) are all Cauchy sequences, and therefore each converges. Say lim,_,.. f(x) 
= fix) (we have defined f in this way). Now fis the pointwise limit of (f,), and 
certainly fis the only reasonable candidate for the uniform limit of (f,,). We need 
to show that the convergence of (f,) to fis uniform. Let ¢ > 0 be given and let N 
be such that If, — fl. <€ whenever m, n > N. Then |f,(x) — f,0)| < Wh. — Silo < € 


for all x. Let n be fixed and greater than N and consider |f,(x) —f,,(x)| to be a 
sequence with index m. Letting m — «, we have f,,(x) — f(x), so that |f,(x) — 
hir(x)| > 7,0) — Ax)| < e. (There is a < symbol where we would like to have a <, 


but this detail can be dealt with easily.) Since the last inequality is true for all x, 
we see that f, — funiformly. = 


EXERCISES 15.2 


1. Recall that the derivative of a function f may be thought of as the pointwise 
limit of the sequence F(x) = n[f(x + 1/n) — fix)]. 


(a) Define the phrase "uniformly differentiable." 


(b) One of the main objects of the study of sequences of functions is the 
examination of those properties of the functions in a sequence that are shared 
by the limit function. Construct and prove a theorem to the effect If f is 
uniformly differentiable (and possibly some other hypotheses), then f' (has 
some nice property). 


2. Write the definition of "(f,) converges pointwise for all x" in symbols and 
compare it with the definition of uniform convergence. 


3. (a) Suppose (f,) 1s a sequence of bounded functions and that f, — f 
uniformly. Show that fis bounded. 


(b) Does this remain true if the convergence is not uniform? 

4. For each of the following sequences of functions, find the set on which it 
converges pointwise, and find the limit. Describe a set, if there is one, on 
which convergence is uniform. (There can be more than one answer to the 
latter question.) 

—_ 
(a) | n ) 
(b) (sin”(x)) 


(c) (aera 


(d) (nxe™) 


9. 


‘ na” ) 
(e) ( 1+nax" 


. (a) Show that the pointwise limit of a sequence of continuous functions need 


not have the Intermediate Value property even though each of the functions 
in the sequence does. (Think of a simple example.) 


(b) Show that the uniform limit of a sequence of continuous functions does 
have the Intermediate Value property. (In view of Theorem 15.4, this is not 
difficult.) 


(c) Establish the result in (b) without referring to Theorem 15.4. 


. (a) Suppose fis uniformly continuous and let f(x) = f(x + 1/n). Show that f, 


— funiformly. 


(b) Is this true if fis assumed only to be continuous? 


. If f, is uniformly continuous for all ” and f, — f uniformly, is f necessarily 


uniformly continuous? 


. (a) Give an example to show that it is possible for a sequence of continuous 


functions to converge to a continuous function without the convergence 
being uniform. 


(b) If fis any function at all, show that the function defined by f(x) = f{x)/n 
converges pointwise to the zero function. 


(c) Show that the convergence in (b) is uniform if fis bounded. 


(d) Show that it is possible to have a sequence of discontinuous functions 
converging (pointwise or uniformly) to a continuous function. 


Show that |/(x)| < lif, for all x in the domain of f. 


10. Show that the supremum norm is in fact a norm in the sense of linear 


algebra: If fand g are bounded functions and & € R, then 
(i) Il, = 0 for all f, and Iifl., = 0 if and only if f= 0 
(ii) IAAL, = |AIM, 
and (iii) Iif+ gil, < Il, + llgll,,. 


11. Let f: [a, b] — R be continuous and define 


flip ae if (|f(x)|)? ts ‘ 


(a) Show that l/l, and l/l, are norms (see the previous exercise). 
(b) Show Il, is a norm for any p > | (this is quite difficult). 
(c) Show that lim,,,., Il, = Iifl.. (hence the notation Iifl.,). 


12. (a) Show that iff, — funiformly, then f, — fpointwise. 


(b) Show that the pointwise limit of a sequence of functions is the only 
reasonable candidate for the uniform limit, as claimed in the chapter. That 1s, 
iff, — f pointwise and g #f it can't happen that f, — g uniformly. 


13. Show that f, — funiformly if and only if lif, — fll, — 0. 


14. Prove Theorem 15.5. 


15. (a) Construct an example of a sequence of differentiable functions that 
converges uniformly to a nondifferentiable function. (We will see just how 
dramatically this can happen in Chapter 20.) 


(b) Let C'[0, 1] be the set of functions f : [0, 1] — R such that /’ is 
continuous. Show that C![0, 1] is a vector space. 

(c) Show that lif) = lif, + I'll, is anorm on C'[0, 1]. 

(d) If Jn © C10.) for all n and Iif, — fly > 0, show that fis differentiable and 
t,—f pointwise. 

(e) Is the convergence you showed in (d) necessarily uniform? 

(f) If f, and f are as in (d), is f necessarily in C'[0, 1]? (the only difficult 
question is whether /’ is continuous.) 


16. (a) Suppose (f,) 1s a sequence of functions on [0, 1] that converges to a 
function f, but that the convergence is not uniform. Show that there must be a 
sequence (x,) in [0, 1] so that *» — © € (0,1) but f(x,,) doesn't converge to 
f(x). 


(b) Show that the sequence (f,) with f,(x) = n*x7e™ converges to 0 for all 


Ze 


0,1] but that the convergence is not uniform. Find a sequence as 
guaranteed in (a). 


17. (a) Define the phrases "The series » /n(*) converges" and "The series 
> fn() converges uniformly." 
(b) Show that the series © /» converges uniformly if and only if, for any ¢ > 
0, there is an V € N so that If, +... + fill, < € whenever n =m > N. 

18. (a) If 22@ is an absolutely convergent series, show that 20@n sin(nx) 
converges absolutely and uniformly. 
(b) What properties of sin(x) did you actually use in (a)? If 22@» is an 
absolutely convergent series, what conditions on the functions f, will 
guarantee that 2 @n/n(") converges absolutely and uniformly? 


19. (a) Consider the set of ordered n-tuples: = (41-“2,---. rn) with distance 
given by 4(#.9) = max{|x; — yil:i=1,....n} Tf 
(#*)) = ( xt", a" ... ot ) 
is a sequence of these objects, show that “") ~ @ = (a, 42,-..,4n) if and only 


(k) 
if“; ~~ @5 for all 7. (Take some time to sort out the notation here. This is 
why we didn't spend any time talking about sequences of n-tuples.) 


(b) Explain why uniform convergence and pointwise convergence are the 
same for n-tuples. 


(c) Repeat part (a) with the distance between n-tuples given by 


d( #, v) />. a l ( Li — Yi fs . 


(d) Show that a sequence of n-tuples converges in the sense given by the 
distance measurement in (a) if and only if it converges in the sense given by 
the distance measurement in (c). 


(e) Draw some pictures to compare these two concepts of distance as they 
apply to ordered pairs. Explain (d) in terms of these pictures. 


15.3 TOPOLOGY OF FUNCTION SPACES 


In Chapter 8 we observed that the topology of a set could be understood by a 


study of its convergent sequences. Now we have an opportunity to try this out. 
We have described the convergent sequences in the space of continuous 
functions but have not discussed what it means for a set of functions to be open. 
We will examine only one aspect of this large problem. We will ask what it 
means for a set in the space of continuous functions to be compact. 


First we must decide which characterization of compactness will be most 
useful. To use the definition of compactness would require us to consider 
functions whose domains are sets of functions. Let's avoid this if we can. To use 
the Heine-Borel theorem we would have to develop a full theory of open sets, 
which we've said we don't want to do. Still another characterization of 
compactness is given in Exercise 11.3.8: A set is compact if every sequence in it 
has a subsequence that converges to an element of the set.° This involves 
concepts we know about and seems to be just what we need. 


We want to avoid discussing the open subsets of the space of functions at any 
length, but they are not difficult to define. We can measure the size of a function 
(its supremum norm) and the distance between two of them (the distance 
between f and g is If — gil,,). We can then define e-neighborhoods, and so open 


sets. We will think about this only long enough to do an example (giving us an 
idea what we are up against). 


The circle of proofs surrounding the Heine-Borel theorem allows us to 
conclude that a compact set of continuous functions must be closed and 
bounded. Unlike the situation on the real line, though, this is not enough to 
guarantee that a set is compact. 


EXAMPLES 15.3: 1. Let 8 be the set of continuous functions on the interval [0, 
1] with lif, < 1. This set is bounded in the sense that the supremum norms of the 
functions in it are all bounded by the same number. We will show that the 
complement of 8 is open. If “o © |9;1] there is an |f("o)| > 1 so that |f(xo)| > 1. 
Let ¢ = (\f{xo)| — 1)/2. If g is any continuous function with If — gil, < «, then 
lg(x9)| > 1, and so 9 ¢ 8. Thus the e-neighborhood of f, {g : If — gl, < e}, is 


contained in the complement of 8, and so & is closed. Consider the sequence (x”), 
which is contained in 8. The limit of this sequence is not continuous, hence is 
not an element of 8, and any subsequence has the same limit. No subsequence 
converges to an element of 8, and 6 can't be compact (even though it is closed 
and bounded). 


Evidently, being closed and bounded is not sufficient to guarantee that a set of 


functions is compact. Another condition must be met, which is defined below. 
This condition is very subtle, and it is easy to see how it might have escaped our 
attention. 


DEFINITION 15.7: A collection of functions is equicontinuous on the set S if, 
for each ¢ > 0, there is a 6 > 0 so that |f(x) — f(y)| < ¢ whenever / is in the 
collection, x and y are in S, and |x — y| <0. 


Every function in an equicontinuous collection is uniformly continuous. This 
uniformity extends through the collection in the sense that the 6 to be found 
doesn't depend on either the choice of x and y or the choice of the function / (the 
collection is uniformly uniformly continuous!). Remember that on the real line, 
the union of the range of a convergent sequence and its limit is a compact set. 
The familiar ring of the following theorem indicates that equicontinuity puts us 
on the right track. 


THEOREM 15.8: /f (f,) is a uniformly convergent sequence of continuous 
functions defined on a compact set with f,, > f, then {fn} U{F} is equicontinuous. 


PROOF: We will first show there is a number N so that {f, : n > N} is 
equicontinuous. Let ¢ > 0 be given. Since (f,) is uniformly convergent, we can 
find N so that lif, — fl, < e/3 whenever n > N. Note that fis uniformly continuous 


and let 6 > 0 be such that |f(x) — f(y)| < e/3 whenever |x — y| < 6. Mimicking the 
proof of Theorem 15.4, we see that |f, (x) —f,(y)| < e. This inequality holds for all 


x and y with |x — | <0 and, more importantly, for all n > N. Thus {f, :n > N} is 


equicontinuous. You will show in Exercise 15.3.4 that (i) A hs set of 
uniformly continuous functions is equicontinuous (so {f, : < N} 1s 


ia ae and (ii) ~ ae of two cat nme sets is ee 
Therefore {fn} U {f} = {fn : N}U {fn V} U{f} is equicontinuous. = 


Our goal is to obtain a theorem that says that a set of functions that is closed, 
bounded, and equicontinuous is (sequentially) compact. We will need the 
following lemma. 


LEMMA 15.9: Suppose 5 © R is compact and T is a dense subset of S. If {f,,} is 
equicontinuous on S and f,, > f uniformly on T, then f,, > f uniformly on S. 


PROOF: We will show that (f,) is uniformly Cauchy on S. Let ¢ > 0 be given 


and 6 > 0 be such that |f(x) — fty)| < ¢/3 for all f © 5 whenever |x — y| < 6 (from 
the definition of equicontinuity). Since S is compact and is the closure of 7, there 
are elements ¢,, ..., ¢, of Tso that every element of S' is within a distance 6 of one 


of these 7's. Since (f,,) converges uniformly on 7, there is an N so that |f,(t,) — 
f,(t,)| < ¢/3 for all m, n > N and all t,. Let « € S and let k be such that |x — t,| < 6. 
For all n, |f,(x) —f,(%,)| < /3 (by equicontinuity). Then, for m, n > N, 


| fm (x) — fn(2)| 
<  |fm(a) — fm(tk)| + [fm(te) — fn(te)| + |fn(te) — fn(z)| 
e/3+e/3+e/3 


Since this is true for all « € S, we have If, — f,|L, < ¢ for m, n > N, and so, by 
Theorem 15.6, (f,) converges uniformly on S. = 


Now we can establish our main theorem. The proof of this theorem uses a 
technique known as "diagonalization" (or "Cantor diagonalization," owing to its 
pictorial resemblance to the proof of the uncountability of the real numbers). 
This is such a familiar and useful technique that authors of research publications 
tend to say "the proof is completed by a diagonalization argument." To be 
precise, we will say that a set of functions is closed if it satisfies Theorem 9.15 
and is bounded if there is a number B so that Ill, < B for all fin the set. 


THEOREM 15.10: (The Arzela-Ascoli Theorem) Let * be a collection of 
functions defined on a compact set S. If F is closed, bounded, and 
equicontinuous, then every sequence in F has a subsequence that converges to 
an element of F. 


PROOF: Let (f,) be a sequence in ¥ and let T = {t,, b, ...} be a countable 
subset of S whose closure is S. Now {fn} © ¥, and so {f,} is bounded (that is, {Il 
filo} 18 bounded). Thus (f,(t,)) is a bounded sequence of real numbers, and so 


has a convergent subsequence by the Bolzano-Weierstrass theorem for 
sequences. We denote this subsequence (f, ;(¢,)). Similarly, (f,; (t))) is bounded 


(watch the subscripts closely), and so it has a convergent subsequence, say 
(f,,2(ty)). Note that (7, (t,)) also converges since (f,,o(¢,)) is a subsequence of 


(7,,1(t,)). We continue in this way, to produce a collection of sequences, each of 
which is a subsequence of the previous one and each converging at all the points 


that gave rise to those before. We may keep track of these sequences in a table: 
fii(ti) fo(ti) faa(ts) fa.(t1) 
fia(te) f2,2(t2) f3,2(t2) fa.2(te) 


fi,a(ts) fo,3(ts) fs.3(ts) fa.3(ts) 


Consider the diagonal sequence (f,,,,). This is eventually a subsequence of (f,, ;) 
for any k, and so (f, ,(t,)) converges for all k. Call the limit / We show that the 
convergence of f, , to fis uniform on 7. Suppose it is not uniform, then there is a 
sequence in 7, say (z,), so that rt, converges to r<T but f,,(z,) does not 
converge to f(z). Then there is an ¢ > 0 so that /f, ,(t,) —A®| 2 2¢ for infinitely 
many values of n. Since 7 € T, f,,(t) — f(z), and for large enough 2, |f,,(=) — 
fit)| < e. Putting these together, we see that there are arbitrarily large values of n 
(making t,, arbitrarily close to t) for which |f, ,(t,) — fy.n(D] 2 & contradicting the 
equicontinuity of ¥. Thus (f,,,,) converges uniformly on 7. By Lemma 15.9, (f, ,) 
converges uniformly on S. Since ¥ 1s closed, the limit is an element of ¥. = 


EXERCISES 15.3 


1. Show that the collection {cos # : » € N} is not equicontinuous. 


2. Suppose {f,} 1s a collection of functions defined on a closed interval J such 
that there is a number B with |f,(x)| < B and |f,(x)| < B for all {fa}. Show that 
{f,} 1S equicontinuous. 


3. (a) Verify that it is possible to select the subset T necessary for the proof of 
the Arzela-Ascoli theorem. (Hint: Make a cover of D consisting of intervals 
of length 1/2; reduce to a finite subcover; pick a point of 7 in each of these 
intervals; now make a cover of intervals of length 1/3, ...) 


(b) Consider the role played by the set 7 in the proof of the Arzela-Ascoli 
theorem. Can the result be obtained while putting conditions on this set that 
are less demanding? 


4. (a) If fis uniformly continuous, show that {7} is equicontinuous. 


(b) If F and G are equicontinuous, show that / UG is equicontinuous. 


(c) Show that a finite set of uniformly continuous functions is 
equicontinuous. 


. (a) Does every family of continuous functions that is not equicontinuous 
contain a sequence that converges to a discontinuous function? 


(b) Does the converse of the Arzela-Ascoli theorem hold? That is, if every 
sequence in a set of functions has a uniformly convergent subsequence, is the 
set necessarily equicontinuous? 


. Show that every set of functions that has the covering property is closed and 
bounded. 


. (Here is an open-ended project.) In determining the distance between two 
functions in the supremum norm, we look only at the differences between 
points on the two graphs with the same x-coordinate. But this does not 
measure the distance from a point on one of the graphs to the other graph. 
For instance, the difference between y-coordinates of points on the graphs of 
fix) =x and g(x) =x + 1 is always 1 (and so If — gil, = 1), while the distance 
from any point (x, f(x)) to the graph y = g(x) is only 1/ V2. We might measure 
the distance between two graphs by considering the distances between points 
on one graph and the other graph. Suppose fand g both have domain D. The 
distance from a point (x, f(x)) to the graph of g would be given by 


inf \/(x — t)? + (f(x) — g(t))?. 
ink \ I ) [ f(x) — g(t) 


We can define the "Distance Between" the graphs of fand g to be 


DB(f,g) = sup inf V(x —t)* + (f(z) —- g(t))? 
re pteD , 


(a) Compute DB(f, g) for some specific examples. Note that it is possible to 
have lif — gil, very large while DB(f, g) is very small (if f and g are parallel 
lines with very large slopes, for instance). 


(b) Should there be restrictions on the domain D to help this make more 
sense? On the functions fand g? 


(c) Is Iflz = DB(F, 0) a norm in the sense of linear algebra? If so, is it the case 
that lif— gilp = DB(f, g)? 


(d) If a sequence of functions "converges DB” (so DB(f,, f) — 0), does it 
necessarily converge pointwise? Uniformly? 


(ce) Are the converses of the results considered in (d) true? 


(f) Consider "big questions" of convergence such as those posed in this 
chapter (and later in Chapter 18). For instance, if a sequence of continuous 
functions converges DB, is the limit function necessarily continuous? 


(g) DB, g) is certainly harder to compute than If — gll,. Assuming 


satisfactory answers to these questions, is there any good reason for using 
this idea of the distance between functions as opposed to the uniform norm? 


15.4 THE WEIERSTRASS M-TEST 


A series of functions, as we might guess, is a special sort of sequence (it is the 
sequence of partial sums formed from another sequence). A collection of 
theorems on convergence of series of functions can be assembled by combining 
the results from the first few sections of this chapter with those in Chapter 13. 
Though the results so obtained are interesting and important, they don't have 
much new to teach us about the real numbers. Here we will prove only one 
theorem, which is important in that it connects uniform convergence of series of 
functions (that is, convergence "as functions") with convergence of series of 
numbers. The theorem has a bland but traditional name. 


THEOREM 15.11: (The Weierstrass M-Test) Suppose (f,,) is a sequence of 
functions with common domain D and that (M,,) is a sequence of numbers with || 
fl, <M, for all n. If “Mn converges, then & fn converges uniformly. 


PROOF: The proof uses the adaptation of Theorem 13.6 that you proved in 
Exercise 15.2.17. Under the hypotheses, we have, for any m and n, 


lt Foes a emcee a fall a < lfm = Coes + ll fn| x < Mm Cokes M,. 


By Theorem 13.6, the sum on the right can be made as small as we wish by 
choosing m and n large enough. The Cauchy criterion for series of functions 
holds, and > /n converges uniformly. = 


The proof of the following corollary is simple and will be omitted. Notice the 
resemblance of this result to Theorem 13.18. In this context, though, its 


significance is more apparent since the series of norms (the "absolute values") is 
very different from the original series of functions. 


COROLLARY 15.12: /f the series > \\fnllo converges, then the series d2fn 
converges uniformly. = 


EXERCISES 15.4 


1.Use the Weierstrass M/-test to show that: 


y sin” x 

(a) n converges uniformly on (—1, 1). 

(b) Le In(x))" converges uniformly on (1/2, 1]. 

(c) 2 nee converges uniformly on [a, 0) for any a > 0. 


2.(a) Does the series in Exercise 1.a converge uniformly on [—7/2, z/2]? 
(b) Does the series in Exercise 1.b converge uniformly on (0, 1]? 


(c) Does the series in Exercise 1.c converge uniformly on all of R? 


15.5 POWER SERIES 


DEFINITION 15.13: A series of the form > @n(* ~ @)” is called a power series 
about a. 


Power series and their uses are familiar from calculus. We can study the 
exponential function, for example, by observing that term-by-term 


differentiation of the series 22n-o “t yields precisely the same series. This is a 
much deeper statement than it might seem. We will concentrate on those parts of 
the theory that justify statements like this (but we will not finish examining the 
issue until Chapter 18). 


One key to the study of power series is the analysis of the sets on which they 
converge. That this question has a complete answer comes as a pleasant surprise. 
The limit superior was defined in Exercise 10.4.10. 


THEOREM 15.14: Let p = lim sup(|a,|!") and R = 1/p (if p = 0, let R = ~; if p = 


0, let R= 0). Then 27 an(x — a)” converges absolutely if |x — a| < R and diverges 
if |x — a| >R. The number R is called the radius of convergence of the 


>a, (2 - a)” 
PROOF: Simply note that 
lim sup %/ja,,(2 — a)"| 


|z — al limsup %/]a,| 


la — al/R, 


ll 


and the result follows from the root test (Exercise 13.8.5). = 


EXAMPLES 15.5: 1. Consider »»-0 3. Here p = limsup(1/2”)!" = lim sup 1/2 
= 1/2. The radius of convergence of the series is 2. 


2. You may have observed two things about the series in Example 1: The lim sup 
construction is unnecessarily complicated since lim(1/2”)!/" exists. More 
importantly, you probably remember series like this from calculus, where you 
dealt with them a little differently. 


3. Here is an example where the full power of Theorem 15.14 is needed: 
. Notice that (|cos(n)|)!” < 1 for all n, but for any ¢ > 0, there are 
infinitely many values of n so that | cos(n)| > 1 — € (you will show this in 
Exercise 15.5.1). For each such value of n, (| cos(n)|)'” is also very close to 1. 
Putting these two observations together, we see that, for any ¢ > 0, there are 
infinitely many n for which | cos(n)|!/” > 1 — e, so that lim sup(|cos(n)|!”)) = 1. 
Thus R = 1. This would have been very difficult to establish by any other 
method, and it still doesn't tell us anything about the limit of the series. 


cos(n)a" 


£n=V 


4. Of course, we do have other methods that are familiar from calculus for 


dealing with power series. By the ratio test, we see that 2n-o nr converges 
x[™— Lint)! - a eee 
absolutely whenever v/a! ~} But this limit is 0 for any value of x, 


so this series converges for any value of x (that is, R = ©), a possibility allowed 
by Theorem 15.14. Incidentally, we can conclude from this (and Exercise 
10.4.10.g) that lim(1/n!)!” = 0, a fact that is a bit tricky to show directly. 


itty 


5. Suppose we construct a series /n-o%®” by choosing a, = +1 at random (by 
flipping a coin, say). Then |a,| = 1 for all n, and so p = 1 and R = 1. We can find 


the radius of convergence of this series without being able to say anything about 
its value for any specific x (compare this with Exercise 13.9.5). 


Theorem 15.14 narrows down the search for the convergence set of a series 
considerably. When discussing series of functions, though, convergence (even 
absolute convergence) is usually not as important as uniform convergence. 


THEOREM 15.15: Suppose the series )2%(£—®)" has radius of convergence R. 
If0<r<R (if R=, this is true for any r > 0), then X4n(©— 4)" converges 
uniformly on|a-rartr]. 


PROOF: Note that © @"" converges absolutely for |x| < R. If x is in the interval 
[a-—ra+r], then |x — a| <r. Ifn > m, we have 


lan (x a a)" 3 petals 7 Am (x =a a)™ | x 
llan(a2 — a)"|| +--+: + llam(z — a)™ Ilan 
<  janr™|+---+lanr™|, 
which can be made as small as we like because 2 @n"» converges absolutely. The 


result follows from Exercise 15.2.17. = 


Every 7 € (a> R,a + ¥) jg in an interval [a — 7 a + r] for some r < R. This 
means that any such x is in some set where ) @n(" — @)" converges uniformly. 
Some of the consequences of this will be examined in Chapter 18. 


EXERCISES 15.5 


1. Show that, for any ¢ > 0, there are infinitely many values of n so that cos(n) > 
Le 


2. Show that the radius of convergence of 2 @n(”—@)" can be found by letting R 
= lim |a,/a,+,| 1f this limit exists. (While this may seem to be easier to 
compute than the expression in the chapter, the lim sup is more likely to exist 
than the limit.) 


3. (a) Suppose 2 4n(* ~ @)" has radius of convergence R. If S is a compact 
subset of (a — R, a + R), show that © @n(% ~ 2)" converges uniformly on S. 


(b) The requirement in (a) that S be compact is unnecessarily strong. Find a 
weaker condition on S that will give the same result. 


4. Find the radius of convergence of the power series )/@n*", where the 


coefficients a, are given by: 


+ ee es rs 
Sa st = 14°71, ...<. 
2 n 
a l 7 = _ / 

a, = aye Bye i t= 2.5, 8.5.0: 
4 n, 
a. Tr 
—-+- be oe des 
4 n- 


5. Let (a,) be given and let 4 = lims1- Dono @n®” if this limit exists. 


A oo “5 ao) (2 
(a) Show that 4 = 2in=o @ if Uno converges. 


(b) Give an example to show that A might exist even if the original series 
d/4n diverges. (This process is called Abel summation. It represents 
another way to interpret the sum of a series. See also Exercise 13.11.2.) 


Va sequence of functions is actually a function whose domain is the natural numbers and whose range 
is some set of functions, but we need not deal in this level of technicality now. 


? Tradition has saddled us with this unfortunate phrase, but remember that this brilliant mathematician's 
name is not an adjective. 


3 This is actually the definition of sequential compactness, which is not quite the same as 
compactness, but the difference need not concern us now. Exercise 11.3.8 says that, on the real line at least, 
sequential compactness and compactness are the same. 


* This proof is very much like those of Theorems 9.7 and 15.4, and it would be a good idea to review 
those proofs before reading this one. 


Chapter 16 


Differentiation 


16.1 ANEW SLANT ON DERIVATIVES 


Differentiation is surely the most familiar topic from calculus. It is also the topic 
that is usually discussed with the most rigor. Since everyone is well acquainted 
with the derivative as a "limit of difference quotients," we will take the 
opportunity to view the subject in another way. Our approach is not at all 
unusual, it is just not the one typically taken in elementary textbooks. It is often 
said that calculus is "the study of change." A review of the subject suggests that 
it is equally valid to view calculus as the study of the approximation of 
complicated things by simpler ones. 


When we ask whether a sequence converges, we are asking how well its 
limiting behavior (something we can't "touch") can be approximated by its 
individual terms (something we can). The convergence of a series is a question 
how well an infinite sum can be approximated by a finite sum. Integration seems 
to be a process by which complicated areas are approximated by rectangles 
(though this, too, is an oversimplification). The ¢-6 definition of continuity can 
be thought of (as at the beginning of Chapter 14) as a statement about the 
approximation of a value of a function that we might not be able to compute by a 
value that we can compute. 


Expanding on this view of calculus, we may say that differentiation is the 
study of the approximation of functions by straight lines. When we think this 
way, we see that the connection between the derivative and the slope of the 
tangent line is not just a remarkably valuable by-product of the process but the 
fundamental issue. The definition of the derivative is not so much about limits as 
it is about the relationship between two graphs. 


DEFINITION 16.1: The function /: R — R is differentiable at the point a if 
there is a number L and a function 7(f) so that y(t) — 0 as t > 0, and f(x) — fla) — 
L(x — a) = (x — a)n(x — a). Such a number L is called the derivative of f at a 
and is denoted /"(a). 


It is convenient to remember this definition by noting that both sides of the 


equation approach 0 even after they are divided by (x — a). 


NOTE WELL: On the left side of the equation in the definition, and the first 
time it appears on the right, the expression (x — a) is multiplied by the things 
around it. On the other hand, the expression 7(x — a) represents the evaluation of 
the function n(t) at the number (x — a). It is especially important here not to 
confuse multiplication with evaluation of a function. 


EXAMPLES 16.1: 1. Let f(x) = x*. We will check that f(1) = 2. Putting the 
function and our guess of the derivative into the definition we have 
f(x) — f(a) — L(x - a) 


= g*-1-2(r-1) 


? . 
= az* —2r+]1 


—— (a2 —1)(2 — 1), 


which approaches zero even after it is divided by x — 1. [To be precise, we can 
say that the definition is satisfied by letting (4) = ¢.] 


2. The choice of the number 2 in the previous example was a particularly good 
one. If, for instance, we had guessed that f(1) = 5, we would find: 


g* —1—5(2 — 1) 
or? —~5r+4 


(x — 1)(x — 4), 


which does approach 0 as x — 1 but does not approach 0 after it is divided by x 
— | [to make this resemble the definition, we would have to let y(t) = x — 3, but 
this does not go to 0 asx > 1]. 


3. Consider f(x) = |x| at the point a = 0. Based on the appearance of its graph, we 
might guess that (0) = 0. But then 


f(x) — f(a) — L(x —a) 
= |z|—0-—O0(zr —-0) 


= (zl. 


The definition of the derivative would require that this expression approach 0 
after it is divided by x, but this is not the case since lim,_,9 |x|/x does not exist 


(even though lim,._,, |x| = 0). Consequently, the derivative of fat a = 0 is not 0. A 


similar argument can be used to show that the derivative of f doesn't exist at all 
fora=0. 


We might feel that, in order to find a derivative, we had to know the answer 
beforehand, while the usual process of finding limits seems to give us the 
answer. In a way this is true, but remember that to find a limit by the definition 
we also must know the answer beforehand. 


EXERCISES 16.1 


1. Show that the derivative of a straight line is its slope. (Think of a short, 
simple answer! ) 


2. (a) If f(x) =x’, use the definition to show that /'(1) = 3. 
(b) If f(x) =x’, use the definition to show that f(a) = 2a for all a. 


3. (a) A function fis said to have a "proper local maximum" at the point a if 
there is a neighborhood U of a so that f(x) < fla) for "© U\{e}. If f is 
differentiable at a and has a proper local maximum there, show that f(a) = 0. 


(b) Prove the First Derivative test from calculus: If the continuous function / 
has a critical point at a [that is, either f(a) = 0 or fis not differentiable at a] 
and there is an ¢ > 0 so that f(x) > 0 for = © (@~ €,@) and f(x) < 0 for (a, a + 
é), then fhas a proper local maximum at a. 


(c) Show that a function can have only countably many proper local maxima. 
4. (a) Suppose f(x) < g(x) < h(x) for x in some neighborhood of a, f(a) = h(a), f 


and h are differentiable at a, and f(a) = h'(a). Show that g is differentiable at 
a and that 2'(a) = f(a). 

(b) Let D(x) be the Dirichlet function [D(x) = 1 if © © Q and D(x) = 0 if © € Q. 
Show that x*D(x) is differentiable at x = 0. 


(c) Show that f(a) > 0 does not imply that f is increasing in any 
neighborhood of a. 


(d) Doesn't (c) contradict a well-known fact from calculus? 
(e) Show that the other conditions on fand / in (a) imply that f(a) = h'(a). 


5. Suppose fis differentiable at a and let L(x) = f(a) + A(x — a). 


(a) If A > f(a), show that there is a 6 > 0 so that L(x) < f(x) for © (¢~ 64) 
and L(x) > f(x) for © © (¢,@+°) and a similar result if A < f(a). 


(b) Use this result to explain (yet again) why f(x) = |x| is not differentiable at 
a=0. 


(c) Use this result to explain how it can be that the tangent line of f(x) = x° 
can cross the graph. 


(d) Discuss how this property might be used to define the derivative. 


(e) Is there a similar result concerning second derivatives? Consider the 
parabola P(x) = f(a) + f'(a)(x — a) + (B/2)(x — a)*. What can be said if B > f” 
(a)? 


. One of the great advantages in looking at derivatives as we have done in this 
section (rather than as limits of difference quotients) is that our definition can 
be used, pretty much as is, for vector functions. If the inputs of our function 
are vectors, we can't divide by the expression x — a, but we can make sense 
of the construction we have used here. First notice that multiplying by L is a 
linear function from R to R, then ... 


DEFINITION: Let f: R* > R. Then /is differentiable at the point (a, ) if 


there is a linear function L : R* > R and a function 7(f) so that 7(t) > 0 as ¢ 
— 0, and 


f(x,y) — f(a,b) — L(z — a, y — b) = (x — a, y — b)n(||(z@ — a, y — B)]I). 


(Notice that the occurrence of L in this definition is a function evaluation 
rather than a multiplication). 


(a) Which aspects of the vector space structure of R* are needed to make 
sense of this definition? 


(b) Considering (a), restate the definition even more generally. 


(c) Show that every linear function L : R* — R can be written as a dot 
product with some vector; that is, for any such ZL, there is an ordered pair (A, 
B) so that L(x, y) = (A, B) « (x, y) (you may have already proved this in linear 
algebra). 


(d) Suppose that f: R* — R has a derivative at (a, b). Show that the partial 
derivatives f, and f,, both exist at (a, 5), and that the derivative of fat (a, 5) is 


(f(a, b), f(a, b)). 


(e) Suppose that both partial derivatives of a function f: R? > R exist. Is the 
function necessarily differentiable? 


(f) Discuss the derivative of a function f: R” — R”. How would it be 
defined? What form would the derivative take? 


16.2 ORDER OF MAGNITUDE ESTIMATES 


The definition of the derivative is a statement about how quickly something gets 
small (the expressions in the definition "get small faster than x — a"). It will be 
useful to have a way of measuring such phenomena. These notational 
conventions are called "Landau symbols." Their use will simplify many 
calculations. The relationships they symbolize are sometimes called "order of 
magnitude estimates." 


DEFINITION 16.2: (a) We write f = O(g) as x — a if there are positive 
numbers 6 and M so that |f(x)| < M\g(x)| whenever |x — a| < 0. This is 
pronounced "fis big oh of g." 

(b) We write f= o(g) as x — a if for every ¢ > 0 there is a 6 > 0 so that |f(x)| < 
é|2(x)| when |x — a| < 6. We say "fis little oh of g." 

(c) If lim,_,, /g = 1, we say f and g are asymptotic and write f ~ g. (We will 
have little occasion to use this symbol.) 


The notation "as x — a" is usually omitted if it is clear from the context. Note 
that f= O(g) if and only if f(x)/g(x) is bounded on some interval (a — 6, a + 0), 
and f= o(g) if and only if lim,_,, f(x)/g(x) = 0. One also can make order of 
magnitude estimates as the variable approaches infinity by replacing the phrase 
"46 ... |x — a| < 6" in the definition with "4B ... x > B." You will check the 
following statements in Exercise 16.2.1. 


z? =0(r) asx—0 
sin(z)~ xvas r—-O0 
7 * hed is Le Li . / 2 \\ 
zx? = O((7z° — 172x°) / (x* + 14z)) as t — co 


a — O(2!") as © — oo. 


Here is one result describing functions related by order of magnitude estimates. 
Its simple proof, and others like it, are left as Exercise 16.2.6. 


THEOREM 16.3: /ff= 0(g) as x — a and h= 0(g) as x — a, then f+ h= 0(g) 
and f—h=0(g)asx—d.= 


This result can be abbreviated: "o + o = 0." Let's get back to derivatives. Using 
the Landau symbols, the definition can be restated: 


f is differentiable at a and f'(a) = L if and only if 
f(z) — f(a) — L(x — a) = o(x — a). 


The Landau symbols give a neat appearance to our work. We will soon see that 
their use also has real benefit. This is a wonderful example of a situation where 
nothing deeper than a choice of notation can greatly clarify a subject. Perhaps 
the choice of notation is "deep" after all! 


EXERCISES 16.2 


1. (a) Show f = O(g) as x — a if and only if f(x)/g(x) is bounded on some 
interval (a — 6, a + 0), and f= 0(g) as x — a if and only if lim,_,,, f(x)/g(x) = 
0. 


(b) Check the statements following Definition 17.2. 


2. (a) If K is a constant, show that K(x — a) = o(x — a) if and only if K = 0. 
(b) Show that K(x — a) = O(x — a) for any K. 

3. The statement "If f= o(g) then f= O(g)" can be abbreviated: 0 = O. Prove 
this. 

4. (a) Show that f= 0(1) as x — aif and only if f(x) — 0 asx — a. 


(b) Show that f= O(1) as x — a if and only if f(x) is bounded on some 
neighborhood of a. 


(c) Show that if f ~ g, then f= O(g) and g = O(/). 
(d) Show that the converse of (c) 1s not true. 


5. Show that if f(t) = o(g(A) as t > a, and h(t) > a ast — Bb, then f. h(t) = o(g « 
h(t)) as t > b. 


6. (a) Interpret and prove: 
(ijo+o0=0 


(ii) o +O=0 


(iii) O+O=O 
(iv) o(O) = 0 [(iv) and (v) represent compositions of functions] 
(v) O(o) =o 


(vi)0 X0=0 
(vil) 0 X 0=0 
(b) What can be said about O x O? 


16.3 BASIC DIFFERENTIATION THEOREMS 


We have shown that the derivative of x* at x = 1 is 2 and not 5. Does the 
definition exclude the possibility that some number other than 2 could work? 
Yos: 


THEOREM 16.4: 4 function can have only one derivative at a point. 


PROOF: Suppose it happens both that f(x) — f(a) — L,(x — a) = o(x — a) and that 
f(x) — fla) — L,(x — a) = ox — a). Then, by Theorem 16.3, 


(f(x) — f(a) — Li(x — a)] — [f(z) — f(a) — Lo(x — a)] = o(2 — a). 


Now the left side of this equality is (L, — L,)(x — a), and this can be o(x — a) 
only if L, = Lj, by Exercise 16.2.2.a. = 


With Theorem 16.4 in hand, we can prove the following reassuring result. 
THEOREM 16.5: The function f is differentiable at a if and only if 


. f(x) — fla) 
lim —————— 


ra r—-a 
exists. If so, this limit is f(a). 


PROOF: This proof is made easier because we know what the derivative should 
be. First, suppose fis differentiable at a and that f(a) = L. We show that the limit 
exists by showing that it is equal to L: 


. | f(x) — f(a) 
lim | ——————. — 


ra T-—a 


L 


. f(x) — f(a) — L(x -a)| 
= lim 


rea |x a 
= 0 


since the numerator of the fraction in the second line is o(x — a). 
f(a)—f{a) 


Now suppose !#":-« “y=, = £, Here we need to show that fx) — fla) — 
L(x -— a) = o(x - a). By Theorem 16.4, this will tell us that 


- ry) — flatt)—f(a) 
L = f'(a). Let n(t) = oS 8 (note that 7(t) — 0 as t—> 0). Then 


f(x) — f(a) — L(x — a) 


_ yer = f(x) — f(a) 
f(x) — f(a) aa 


— (x - a)| (x —a) 


= n(x -—a)(z —a) 


= o(z-—a). ff 
With Theorem 16.5, we may prove the following result in the usual way. 
THEOREM 16.6: /f fis differentiable at a, it is continuous at a. = 


Theorem 16.5 returns differentiation to a familiar setting, and the proofs of the 
basic theorems of differentiation can be referred back to a calculus course. Let us 
investigate some of those results in this new setting, though. 


THEOREM 16.7: Suppose fand g are differentiable at a and c € R then 
(a) f + g is differentiable at a, and (f + g)'(a) = f(a) + g'(a). 

(b) cfis differentiable at a, and (cf)'(a) = cf'(a). 

(c) fg is differentiable at a, and (fg)'(a) = fla)g'(a) + f(a)g(a). 

(d) f/g is differentiable at a, and 


( A _. _ £'(a)g(a) — f(a)g'(a) 
a) = — ———— 
gg)! (gla))* 


[as long as there is a neighborhood of a in which g(x) # 0]. 


PROOF: We will prove only part (c), leaving the rest as Exercise 16.3.2. In 
view of Theorem 16.4, we need only show that 


f(x)g(x) — f(a)g(a) — [f(a)g'(a) + f’(a)g(a)|(z — a) = o(@ — a). 
We use the trick of adding and subtracting something on the left side of the 
equation [here it will be f(a)g(x)] and rearranging terms, we find: 


f(x)g(x) — f(a)g(a) — [f(a)g’(a) + f'(a)g(a)|(z — a) 
f(x) — fla)lg(x) — f’'(a)g(a)(2 — a)+4 
(g(x) — g(a)| f(a) — g'(a) f(a)(x — a). 


The last two terms combine to become f(a)[g(x) — g(a) — g'(a)(x — a)], which is 
o(x — a) regardless of the value of f(a) since g is differentiable at a. We can't 
dispose of the first two terms so easily. How is g(x) related to g(a)? Since g is 
differentiable at a, g(x) = g(a) + g'(a\(x — a) + o(x — a). Inserting this 
observation into the first term, we have 


(f(x) _ f(a)|g(z) — f'(a)g(a)(x — a) 
= ([f(x) — f(a)|[g(a) + g'(a)(x — a) + o(x — a)} — f'(a)g(a)(x — a) 


= (f(x) — f(a) — f’(a)(x — a)|g(a)+ 
(f(x) — f(a)|g’(a)(z — a) + [f (x) — f(a)Jo(x — a). 


The first of these terms is o(x — a) since f is differentiable at a. Since / is 
continuous at a, f(x) — fla) — 0 as x — a. Thus the second and third terms are 
also o(x — a). We have shown that 


f(x)g(x) — f(a)g(a) — [f(a)g'(a) + f’(a)g(a)|(x — a) 
o(z — a) + o(x — a) + o(2 — a) + o(z — a) 


= o(zr-—ay), 
and we are done. = 


This proof shows the usefulness of the Landau notation. As each part of the sum 
is shown to be the "right size"—here this means o(x — a)—we can pretty much 
set it aside. The specific content of these expressions is no longer important. If 
we were to do this proof using the standard definition, we would have to keep 
track of these things more carefully. More importantly, the notation allows us to 
deal with equalities rather than estimates and inequalities. In this proof we have 
used statements like "(something that goes to 0) x (x — a) = o(x — a)," which you 
verified in Exercise 16.2.6. 


We will prove one more standard theorem from calculus. The Chain rule, 


certainly the most important differentiation formula, is treated badly in some 
calculus texts. The "obvious" proof, actually presented in some texts, is simply 
incorrect (though it can be repaired). You will examine and fix it in Exercise 
16.3.3. 


THEOREM 16.8: /f g is differentiable at a and f is differentiable at g(a), then 
the composition f . g is differentiable at a and 


(f og)'(a) = f'(g(a))g’(a). 
PROOF: We need to show that 
(f og)(x) — (f og)(a) — f’(g(a))g'(a)(x — a) = o(2 — a). 


Here we must be a bit clever. In the expression f(g(x)) — f(g(a)), we have g(x) 
and g(a) inserted into f where we would like to see x and a. But we can get 
around this problem. Since g is continuous at a, we know that g(x) — g(a) > 0 as 
x — a. We may replace x and a in the definition with g(x) and g(a) (see Exercise 
16.2.6), to obtain 


f(g(x)) — f(g(a)) — f'(g(a)) (g(x) — g(a)] = o(g(x) — g(a)). 


Since g is differentiable at a, we have g(x) — g(a) = g'(a\(x — a)t+o(x — a). 
Plugging this in to the last equality gives us 


F(g(x)) — f(g(a)) — f’(g(a))[g'(a)(a — a) + o(x — a)] 


= o(q'( a)(x — a) + o(@ — a)) 
or 


f(g(x)) — f(g(a)) — f’(g(a))g'(a)(x — a) 
J\G { J f 


= o(g'(a)(z — a) + 0o(z — a)) + f'(g(a))o(x — a). 


The last term is o(x — a). We can simplify the first term by observing that g'(a)(x 
— a) = O(x« — a). You proved in Exercise 16.2.6 statements like O + o = O and 
0(O) = 0, which together give us the result. = 


In this proof, the specific content of the expression g’(a)(x — a) was needed at 
one stage (on the left side of the last equation), while at another we only needed 
to know that g'(a)(x — a) = O(x — a). 


EXERCISES 16.3 


1. (a) Prove Theorem 16.6 in the usual way. 
(b) If f(x) — f(a) = O(x — a), show that fis continuous at a. 


(c) Prove Theorem 16.6 using the definition of differentiability given in the 
chapter. 


(d) Give an example of a function fand a point a such that fis continuous at 
a but it is not the case that f(x) — f(a) = O(x — a). 
2. Complete the proof of Theorem 16.7. 


3. (a) What is wrong with the following "proof" of the chain rule? 


ya 


f(g(x)) -—flg(a))  flg(x)) — f(g(a)) (= — g(a ') 
(1) Note that r—a g(x) — g(a) 


«t-—@4 


(II) The middle fraction approaches /(g(a)), while the one on the right 
approaches g'(a), so the whole thing approaches /'(g(a))g'(a). 


(b) Find a way to use this approach in a valid proof. 

4. (a) If f(x) =x, show that f(a) = 1 for all a. 
(b) If /(@) = 2". n © N, show that f(a) = na"! for any a. 
(c) Let f(x) = 1/x and a > 0. Show that f(a) = —1/a?. 


5. Suppose fand g each have n derivatives at a point a. Find a formula for the 
nth derivative of the product fg at a (this is called Leibniz' Rule). 


6. (a) Suppose that fis differentiable at a point a. Show that 


: : fla th) fla — h) 
/ « 7 \ 

(a) = lim ——————_ ‘ 
f h-—O Qh 


(b) Draw a picture that describes the fraction on the right. 


(c) The expression on the right in (a) is called the symmetric derivative of / 
at a. Give an example of a function that has a symmetric derivative at a point 
but is not differentiable there. 


(d) (Difficult!) At how many points can a function have a symmetric 


derivative but not a derivative? 


16.4 THE MEAN VALUE THEOREM 


The Mean Value theorem gives theoretical muscle to virtually all the important 
results of calculus. In this section we examine the collection of results leading up 
to the Mean Value theorem. The road to the Mean Value theorem is not really 
very long, and the road itself was pretty well constructed in calculus. All we 
need to do now is to find the entrance ramp. We will go a bit out of our way to 
use our definition of the derivative instead of limit quotients, but we will arrive 
in familiar territory. 


We highlight the first lemma in part because it is the main step in the one that 
follows it, and because it brings us perilously close to jumping to a bad 
conclusion. We might try to say something like "if f(a) > 0, then fis increasing 
at a." But the phrase "increasing at a" is not precisely defined, and the local 
behavior of a function whose derivative is positive at a single point may not be 
what we expect (see Exercise 16.1.4). 


LEMMA 16.9: Suppose fis differentiable at a and that f(a) # 0. There is an ¢ > 
0 so that if %1 © \@~ ©,4) and 2 © (9,9 +), then one of fix,) and f(xy) is greater 
than f(a) and the other is less than f(a). 


PROOF: We must take advantage of the approximation of a function by its 
tangent line. Let us assume f(a) > 0. We will show that there is an ¢ > 0 so that 
f(b) > fla) for all ® © (4,4 +) (the other part of the proof is similar). By the 
definition of the derivative, we know that f(x) = f(a) + f(a)(x — a) + n(x — ay(x - 
a) and so, for any b > a, 


f(b) — fla) 
= f'(a)(b—a)+n(b-—a)(b—a) 


_ [ f(a) + (bd a)| (b a) 


We want f(b) — f(a) > 0. Since f(a) > 0 and b — a> 0, we need only be sure that | 
n(b — a)| < f(a) for this to be true. But n(b — a) — 0 as b — a, and so we may 
pick ¢ so that |7(b — a)| < f(a) whenever ? € (@,¢+¢). » 


Recall that a function f has a local maximum at a [and we say f(a) is a local 
maximum] if there is an ¢ > 0 such that f(x) < f(a) for any = © (@~— 54+ 8), 
Local minimum is defined similarly, and a local extreme is a number that is 


either a local maximum or minimum. Lemma 16.10 is a corollary to Lemma 
16.9, but we call it a lemma because of the role it plays in what follows. The 
proof is left as Exercise 16.4.2. 


LEMMA 16.10: [ff is differentiable at a and f has a local extreme at a, then f 
(a)=0. = 


THEOREM 16.11: (Rolle's Theorem) Suppose f: [a, b] — R is continuous, that 
f is differentiable at each point of (a, b), and that f(a) = f(b) = 0. Then there is a 
point © © \@,) with f'(c) = 0. 


PROOF: By the Extreme Value theorem, f has a maximum and a minimum on 
[a, b]. If both these values are 0, then f(x) = 0 for all x, and so f(x) = 0 for all x. 
Any element of (a, 5) will serve as the point c. Now suppose that the maximum 
value of fis f(c) and f(c) > 0 (a similar argument applies we know only that the 
minimum is negative). Since f(c) > 0, c is neither a nor b, and by hypothesis fis 
differentiable at c. Now f(c) is also a local maximum, and so by Lemma 16.10, / 
(c)=0.. 


Notice that the condition f(a) = f(b) = 0 can easily be replaced by the condition 
fia) = f(b). The completeness of the real numbers enters this proof in a small but 
crucial way. Extreme values of a function are the only ones for which we can 
guarantee that / = 0 (though it might happen that f’ = 0 at other points), and the 
completeness of the real numbers guarantees the existence of extreme values. 
Rolle's theorem is not true for functions whose domains arc intervals of rational 
numbers. 


THEOREM 16.12: (The Mean Value Theorem) Suppose f : [a, b] — R is 
continuous and that f is differentiable at each point of (a, b). There is a point 


ec € (4,5) with f'(c) = (f(b) — fla))(b — a). 


PROOF: This is one of those rare proofs that can be seen almost entirely with a 
picture (below). We subtract from f the secant line through (a, f(a)) and (b, f(b)), 
which we call S(x). Note that the slope of S is [/(b) — f(a)] /(b — a). Since F(x) = 
fix) — S(x) satisfies the hypotheses of Rolle's theorem (be sure to check this), 
there is a real number © © (2,5) with F'(c) = 0 (in this picture there are two such 
points). But 


f(b) — f(a) 


F"(c) = f'(e) — S'(c) = f'(e) - 
b-—a 


which gives us what we want. = 


EXERCISES 16.4 

1. Complete the proof of Lemma 16.9. 

2. Prove Lemma 16.10. 

3. Complete the proof of Theorem 16.11. 


4. Give an example to show that Rolle's theorem fails for functions defined on 
the rational numbers. 


5. (a) Prove the basic theorem of elementary calculus: Jf f : [a, b] — R is 
continuous, then (1) f attains a maximum on [a, b| and (2) that maximum 
occurs at (1) a point where f'(x) = 0, (11) a point where f(x) does not exists, or 
(11) a or b. 

(b) Explain why it is not really necessary (though every calculus text does it) 
to single out a and b as possible solutions in part (a). 


6. (a) Let f: [a, b] — R and g: [a, b] — R both satisfy the hypotheses of the 
Mean Value theorem. By considering the function 
F(x) = (Ax) — flayy(g(®) — g(a) — (g) — g(a))(KS) — fa), show that there is 
a number © € (2; 5) such that 


f'(c) _ f(b) - f(a) 


g'(c) g(b) — g(a) 


(This is called the Cauchy Mean Value theorem.) 


(b) By considering f and g to be the coordinate functions of a parametric 
curve, interpret the Cauchy Mean Value theorem geometrically. 


(c) Use the Cauchy Mean Value theorem to prove I'H6pital's rule. 


16.5 THE MEANING OF THE MEAN VALUE THEOREM 


The Mean Value theorem is one of the most important in calculus. This stems 
both from the nature of the theorem itself and the things that can be done with it. 
The Mean Value theorem is the first one we encounter in calculus that gives us 
something specific (the point c). Earlier theorems are more negative in tone 
("There are no differentiable functions that are not continuous.") But the real 
power of the Mean Value theorem lies in what can be done with it. The diagram 
in Chapter 10 reminds us that the Mean Value theorem is the bridge to both 
Taylor's theorem (which we will prove in the next section) and the Fundamental 
theorem (which we will prove in in the next chapter). The first, and most 
transparent, uses of the Mean Value theorem in calculus are in the following 
theorem. 


THEOREM 16.13: (a) [f f: (a, b) — R is differentiable and f(x) = 0 for all 
w € (4,6) then fis constant on (a, b). 

(b) If f: (a, b) > R is differentiable and f(x) > 0 for all * © (45), then f is 
increasing on (a, b) 

(c) If f : (a, b) > R is differentiable and f(x) < 0 for all © © (4,), then f is 
decreasing on (a, b). 


PROOF: We will prove (a): the rest is left as Exercise 16.5.1. Suppose fis not 
constant. Then there are points *.¥ © (@,5), with f(x) 4 f(y). Then [f(v) — fO)/(v - 
x) # 0. By the Mean Value theorem, there is a point c between x and y with [/(y) 
— fix)V/(v — x) =f'(c) #0, a contradiction. = 


EXERCISES 16.5 
1. Complete the proof of Theorem 16.13. 
2. Suppose that fand g are differentiable on an open interval (a, b) and that f(x) 


= g'(x) for all © © (2,5). Show that f(x) — g(x) is constant on (a, b). Where does 
this come up in elementary calculus? 


3. (a) Suppose fhas the property that, for any x and y in the domain of f |f(x) — 
fiy)| < |x — y|. Show that fis uniformly continuous. 
(b) Suppose fhas the property that there is a number a > | so that, for any x 


and y in the domain of f, |f(x) — f(y)| < |x — y|*. Show that fis constant. (Hint: 
Show that the derivative of fis always 0.) 


4. By considering the derivative of the quotient f(x)/e*, show that if f(x) = f(x) 
for all x and (0) = 1, then f(x) = e*. 


16.6 TAYLOR POLYNOMIALS 


When we approximate a function by its tangent line, we suffer an error that is 
o(x — a). Perhaps we can get an error that is, say, o((x — a)*) by approximating 
the function by a (not very much) more complicated object—a polynomial of 
degree higher than 1. Though an error of o(x — a) is enough to give us what we 
needed in the previous sections, it is not very delicate as a measuring device. 
There is much territory covered by "o(x — a)," and we might well desire a more 
precise statement of the error in some process. By a happy coincidence, we can 
obtain better approximations and better error estimates at the same time. 


DEFINITION 16.14: If f(a), f(a), ..., f(a) all exist, the nth Taylor 
polynomial! of the function fat the point a is given by 


pes “. f'*)(a) 
T(z} 7): i (x — a)" 


k=6 


Here f(a) = f(a). Notice that a Taylor polynomial represents the approximation 
of a complicated object (f) by a simpler one (7,,). As we get better at such 
processes, we can expand our idea of what is "simple." The tangent line to a 
function 1s special because it matches the values of both the function and its 
derivative at the point where it is computed. If we form an approximation with a 
polynomial, we can expect higher-order derivatives to match as well. You will 
show in Exercise 16.6.3 that the values of the first n derivatives of 7, and f 


match at a. The question remains how 7, is related to fat points other than a. 


THEOREM 16.15: (Taylor's Theorem) [fx > a, f is continuous on [a, x], and 
f{"") exists on (a, x), there is a point © © (@:©) with 


r(n tN (e), ; 
f(x) = fia Od _ ln a a)"* : 


(A similar statement holds if x < a.) 


PROOF: We will view as a problem of solving an equation. If x and a are given, 
there is some number 4,,(x) with f(x) = T,,(x) + A,,(x)(x — a)"*!. We want to show 


that 4 (x) =f") (e)/(n + 1)! for some ¢ € (4,2), 
Here we resort to some trickery. Let 
F(t) = f(t) + f'(t)(a—t) +++ + f™ (t)(a2 — 1)" /n! + An(x)(a — t)?*}. 


By hypothesis, f” is continuous on [a, x] and f f, ..., and f™ are all 
differentiable, and so F' is continuous on [a, x] and differentiable on (a, x). Note 
that F(x) = f(x) [since all but the first term are 0 when we plug in t = x] and F(a) 
= f(x) [by the choice of A,,(x)]. Thus the function of ¢ given by F(t) — f(x) is 0 at 


both a and x. Applying Rolle's theorem, there is a point © € (¢,) with F'(c) = 0. 
Now 


+ fF) (t)(x — t)"/n! — An(x)(n + 1)(x - t)". 


Most of the terms in this expression cancel, leaving 
F'(t) = f(t) (t)(x — t)"/n! — Ay(xz)(n + 1)(x — t)”. 


Plugging in t = c, setting the left side equal to 0, and solving for 4,(x) gives us 
the desired result. = 


We will set R,(x) = fix) — T,,(x). [R,(x) is the "remainder" upon approximating 
f(x) with T,,(x).] Then Taylor's theorem says there is a number c between a and x 


f I (c) (> a a)ntl 


so that n(*) = “Trip 


EXAMPLES 16.6: 1. Let f(x) = sin x and a = 0. Then 7\(x) = x. Since |f"(x)| < 1 


for all x, we have |R,| < |x|*/2. The approximation of sin x by x is thus O(x”) as x 
— 0. Since /"(0) = 0, we also have 7>(x) = x, and since it is also the case that |/” 


(x)| < 1, we have |R,| < |x|?/6. The approximation by the tangent line is thus 


O(x3), even better than expected. 


2. The tangent line to a function fis 7;. For n = 1, Taylor's theorem says f(x) — 
T(x) = f"(c)(x — a)*/2. If it happens that f"(c) is bounded, then fx) — T,(x) = 
O((x — a)’), and so f(x) — T;(x) = o(x — a), which brings us back to the definition 
of the derivative. The similarity between Taylor's theorem and the definition of 


the derivative raises a question: Can a statement of the form of Taylor's theorem 
be used to define higher derivatives? You will explore this in Exercise 16.6.6. 


EXERCISES 16.6 
1. (a) Use Taylor's theorem with n = 3 to find an estimate for V65 [this is the 
function /(*) = Vv” evaluated at x = 65] and give an estimate of the error. 


(b) What degree of Maclaurin polynomial is needed to estimate e? with an 
error of less than 0.001? 


2. (a) Use Taylor's theorem to prove the Second Derivative test. 


(b) Extend the Second Derivative test. If /” = 0, are there conditions under 
which the conclusions can still be drawn? 


3. (a) Show that 7, and fmatch at the point a for their first n derivatives. 


(b) Suppose f(x) is a polynomial of degree n. Show without doing any 
calculations that T,,(x) = f(x) for m =n. 


4. Verify the statement in the proof of Taylor's theorem that "Most of the terms 
in this expression cancel." 


5. Recall that Taylor's theorem says that, under appropriate conditions, f(x) — 


(fla) + f(a)(x—a) + (f"(a)/2)(x-a)’) = o((x-a)*) as x > a. 


Suppose / satisfies the hypotheses of Taylor's theorem for n = 2, and that A # 
f(a). Show that 


f(x) — (f(a) + f’(a)(x — a) + (A/2)(x — a)*) ¥ o((x — a)?). 


6. Discuss how the observation in Exercise 16.6.5 could be used to define the 
second (and higher) derivatives. 


7. Describe in words what is going on in Example 16.6.2. Does a bound on the 
second derivative have implications for the shape of a graph? Does your 
answer describe functions such as f(x) = x7, whose second derivative is 
constant but whose shape seems to change from place to place? Look up 
"curvature" in a calculus book. 


16.7 TAYLOR SERIES 


The form of Taylor polynomials suggests a natural question: What if we simply 
continue computing terms forever? 


DEFINITION 16.16: If the function f has derivatives of all orders at a, the 
Taylor series for f at a is 


~ 


fin) ( a) 
~ : S? (a —a)" 
n! 


n=0 


(as before, this is a Maclaurin series if a = 0). 

For instance, the Maclaurin series for f(x) = e* is Leno at since f™(0) = 1 for all 
n. There is only one question of real interest: Is the Taylor series of a function 
equal to the function? The next example shows that it is possible for the answer 
to be no in quite dramatic fashion. 


EXAMPLES 16.7: 1. Let {(@) =©""" if x #0 and f(0) = 0. You will show in 
Exercise 16.7.1 that f(0) = 0 for all m. Thus the Maclaurin series for f is 
identically 0), but the function certainly is not. The series is equal to the function 
only at x = 0. This function is often referred to as "infinitely flat" at 0. 


If g is any function that is equal to its Maclaurin series (such a function is called 
analytic, but we don't know yet that there are any!), and fis as above, then (g + 
f)(0) = g™(0) for all n, and so g and g + fhave the same Maclaurin series. But 
g(x) = (g + f\(x) only for x = 0. We can't tell whether a function is analytic just 
by looking at its series, but the following corollary to Taylor's theorem helps us 
make this decision. We need only observe that the Taylor polynomials of a 
function are the partial sums of its Taylor series to obtain: 


COROLLARY 16.17: Jf R,(x) is as in Theorem 16.15 and R,(x) — 0, then 


x 


f'™ (a), 
rey y ——-(¢ — gq)" . 
n! 


n=0 


EXAMPLES 16.7: 2. If f(x) = e*, then R,(x) = ex" '/(n + 1)!. If x > 0, we have 
e° < e%, so R,(x) < e%x’*!/(n + 1)!. We have seen elsewhere that 2 2"*'/(n +1)! 
converges for all values of x. By the nth Term test, lim,_,,, x7*!/(n + 1)! = 0, so 
R,(x) — 0 for all x > 0. Thus © * = Dro ar fort > 0 for y > 0. (The conclusion 
is much easier to reach for x < 0.) 


The hypothesis of our final corollary is quite restrictive but the result is still 
useful in many contexts. 


COROLLARY 16.18: Jf there is a number M so that \f™(t)| < M for all n and 
for all t between a and x, then f is equal to its Taylor series between a and x. = 


EXERCISES 16.7 


1. Let f(x) =e"! ifx £0 and f(0) = 0. Show that (0) = 0 for all n. 
2. (a) Find the Maclaurin series for sin x and cos x and show that they converge 
for all x to their respective functions. 


(b) Notice that, if you were to change all of the signs in the series for sin x 


and cos x to + and add the results together, you would get the series for e”. 
Does this mean anything? 


3. (a) Find the Maclaurin series for In(1 + x) and discuss its convergence. 


(b) True or False: The alternating harmonic series converges to In(2). 


4. (a) Use the result of Exercise 4.5.17 to expand (! + =)", 


(b) Recall from calculus that the limit of this sequence is e*. Compare the 
result in (a) to the Maclaurin series e* = 1 + x +.x7/2+x7/6+.... 


5. (a) If we let f(x) = e*, then the "infinitely flat" function is (essentially) 
f(-1/x’). Notice that f(x) has a horizontal asymptote as x — —oo. Show that 


this is not enough to make /{(—1/x’) infinitely flat by considering f(z) = Fe, 


(b) Find another infinitely flat function. 
(c) What aspect of the behavior of e* as x > —o0 does make e~'/*" infinitely 
flat? 


. (a) Construct a function that is infinitely differentiable everywhere and is 
infinitely flat at more than one point. 


(b) Construct a function that is infinitely differentiable everywhere and is 
infinitely flat at each integer. 


(c) (Research!) At how many points can an infinitely differentiable function 
be infinitely flat and yet not be constant? 


. On the graph of a typical parabola, a secant line connecting two points on the 
curve lies entirely above the curve, like this: 


A graph with this property is said to be convex. 


(a) Show that if fis convex and differentiable, then any tangent line to f lies 
entirely below the graph. 


(b) Show that if fis convex and differentiable, then / is increasing. 
(c) Show that if fis convex and twice differentiable, then /” is non-negative. 
(d) If fis convex (differentiable or not), and a < b<c<d show that 


f(b) — fla) zs f(d) — f(c) 


b-—a = d-—ec 


Interpret this in words. 


. (a) Give an example of a discontinuous convex function f: [0, 1] > R. 


(b) Show that a convex function whose domain is an open interval must be 
continuous. 


(c) Show that if fis known to be continuous, it is sufficient to require that the 


midpoint of any segment connecting points on the curve is above the curve 
for f to be convex. 


9. Aregion in the plane is called convex if the line segment connecting any two 
points in the region lies entirely within the region. 


(a) Draw a picture to illustrate this. 


(b) Show that a function is convex if and only if the region above its graph is 
convex (note that "convex" has two different meanings in this sentence). 


(c) Is the union (or intersection) of two convex regions necessarily convex? 


(d) Explain why this definition of convex makes sense for regions in any 
vector space. 


(ec) Does the definition of a convex function make sense for functions whose 
domains or ranges are vector spaces? 


(f) Which subsets of the real line are convex? 


10. A region in the plane is called star convex if it contains a point p so that the 
line segment connecting any point of the region to p lies entirely within the 
region. 


(a) Draw a picture to illustrate why this is an appropriate name for such a 
region. 


(b) Show that any convex region is also star convex. 


11. (a) If Vis a vector space and Il « ll is a norm on V (see Exercise 15.2.9), show 
that the set ? = tv © V : |lv!| < lh is convex (B is called the unit ball in /). 


(b) Show that B has the property: (¥ © 8) @ (-v € B), 


(c) A set with the property in (b) is called balanced. Say why this is an 
appropriate term. 


(d) Show that B has the property: Vv € Vak €¢ Ra(ku € B), 


(e) A set with the property in (b) 1s called absorbing. Say why this is an 
appropriate term. 


(f) Draw pictures to show that the properties "convex," "balanced," and 
"absorbing" are independent. (You will need to draw three pictures, each of 
which is a set having two of the properties but not the third.) 


(g) Suppose C is a subset of a vector space V that is closed, convex, balanced, 


and absorbing. For ¥ © V; let uc(v) =inf{k>0:;vEC} This is called the 
Minkowski functional of C. Show that wc is a norm on V and that C is its 
unit ball. 


12. (a) Ifo: [c, d] > Ris a convex function and “1, 72,----@n € |e.) show that 


y C ss — - (a1) + (a2) +--+ + olan) 
P c ) 
Tm n 


(b) If g is as in (a) and f: [0, 1] — [c, d], show Jensen's Inequality: 


1 a1 
~ ( f(x tr < / yp( f(x))dz. 
0 J0 


(c) What is the significance of the fact that the domain of fin part 
(b) is [0, 1]? Modify the result so that this is not necessary. 


el 2 1 
(| sin(r) iz) < | sin’ «dx 
(d) Show that \/o 0 . 


lifa= 0, this is called a Maclaurin Polynomial. This doesn't mean that Maclaurin became famous by 
setting a = 0. Among his other accomplishments, Maclaurin was the author of the first calculus text in 
English (originals of which can still be found in rare-book stores). 


Chapter 17 


Integration 


17.1 UPPER AND LOWER RIEMANN INTEGRALS 


In calculus we learned to compute Riemann integrals and examined some of 
their uses. While techniques and applications of integration are a major part of a 
calculus course, these are not our concern here. Our goal now is to describe 
carefully the limit process involved in the definition of the integral and establish 
some of the major theorems of the subject. As we did in Chapter 16, we will 
adopt a slightly different approach from that usually taken in calculus texts. This 
method, which highlights the role of the completeness of the real numbers and 
smoothes out many of the proofs, is accessible to us now thanks to our 
knowledge of infima and suprema and our familiarity with the Cauchy criterion. 
The idea of examining the integral through upper and lower estimates was 
developed by Darboux about 20 years after Riemann's original work. In the 
setting of the real numbers, however, the results are the same (Theorem 17.18), 
and the process is still usually called "Riemann integration," although "Riemann- 
Darboux integration" would be more appropriate (and some people do use this 
name). Since we are familiar with the major characters in this story, we can jump 
right in. 


DEFINITION 17.1: (a) A partition of the closed interval [a, b] is a set P = {Xxo, 
Xj, .--, X,} with a = x9 <x, <... <x, =b. The intervals [x;,_,, x;,] are called the 
subintervals of [a, b] given by P. 

(b) A partition P’ is a refinement of P if ? © ?”. 


DEFINITION 17.2: If f is bounded and P = {xo, x), ..., x,}, let 
M, = sup{f(z): 2 € [te-1, tx] }. and ™: = inf{f(z): 2 € [Te-1, Tk} }, k= 1. cass Thi 
Then the upper and lower Riemann sums for / over [a, 5] with partition P are 
given by U(f,P) = Dp) MeAcn and Lif, P) = eee Ary, respectively (where 
Ax = XA ~ Xp-1): 


Since m; < M, for all k (Exercise 5.1.4), it is clear that L(f P) < U(f P) for any 


partition P. But this inequality between upper and lower sums holds in an even 
stronger sense. 


LEMMA 17.3: (a) Jf P' is a refinement of P, then L(f, P) < L(f, P') and U(f, P’) < 
U(f, P). 

(b) If P, and P, are any two partitions, L(f, P,) < U(f, P>). 

PROOF: (a) Partitions are, by definition, finite. We will prove this is true when 
P' is obtained by adding one point to P. The result will follow by induction. 


Suppose ?’ = PU {#"} and x,_, <x <x, The contributions to the lower sums 
constructed with P and P’ are the same from all intervals except [x;,_), x;], and 


linfre[r, 1.rK] f(x)| (x rp-1) 
= [infreje,_,2,) f(z)|(re — z*) + [infreje,_,,2,) f(z)](z* — Te-1) 
linf rete, _,,2°) f(£)] (te — 2*) + [infrete-,2,) f(x)|(z* — Te-1). 


The last inequality holds by Exercise 5.1.6. It follows that /(/,P) SL(f, PU {x"}) 
. Upper sums are handled similarly. 


(b) Let P = i: UP, This is called the common refinement of P, and P,. Then 
by part (a) and the observation before the lemma, 


L(f, Pi) < L(f,P) < U(f,P) < U(f, Po). 


Lemma 17.3 confirms the impression given by the familiar pictures used to 
describe the integral in calculus, that making a refinement of a partition makes 
any upper sum smaller and any lower sum larger. More importantly, Lemma 17.3 
tells us that, as long as the function fis bounded, the collection of lower sums is 
bounded above (by any of the upper sums) and the collection of upper sums is 
bounded below (by any of the lower sums). The Least Upper Bound property 
then allows us to make the following definition. 


DEFINITION 17.4: If f: [a, 5] — R is bounded, then the upper and lower 
Riemann integrals for f over [a, b] are U(/) = inf U(f P) and L(f) = sup Lif P), 
respectively, where the supremum and infimum are taken over all partitions P of 


[a, b]. 


The next theorem follows from Lemma 17.3 and Exercise 5.1.11. 


THEOREM 17.5: L(f) < U(/). = 


DEFINITION 17.6: The function f is Riemann integrable (or simply 
integrable) on [a, b], with integral J, if L(f) = U(/) = I. If this is the case, we 
write Jp f(x) dr = 1, 


Notice that there does not seem to be a limit process involved in the definition of 
the integral! (At least the limit process is well hidden.) 


EXAMPLES 17.1: 1. If f(x) = C for all x, then L(f P) = U(f, P) = C(b — a) for 
any partition P. Thus any constant function is integrable, with integral C(b — a). 


— 
2. Let f(x) = 0 for x # 0 and f(0) = 1. We will show that J-1 1") 4” =0, Since m, = 
0 for any interval, we have L(f, P) = 0 for any partition, and so L(/) = 0. We may 
assume each partition has 0 as an element (since adding a point to a partition 
produces a refinement of it). Now if 9 ¢ |*s-1, "sl, we have M;, = 0. If x; = 0, then 


M, = Mj); = 1 and Uf, P) =x), — x;-1. Now x,,; —x;-1 > 0, and it can be made 
as small as we wish by refining P. Thus U(f) = 0 = L(/). 


3. Let D(x) be the Dirichlet function over the interval [0, 1]. Then, for any 
partition P U(D, P) = 1 and L(D, P) = 0. Thus U(D) = 1 and L(D) = 0, and D is 
not integrable over [0, 1]. 


| 
4. We verify that Jo 74 2, Let P,, = {0, 1/n, 2/n, ..., 1} and examine upper and 
lower sums based on P,,. Since f(x) = x is increasing, we have m; = (k — 1)/n and 
M, = k/n for all k. Then 


EAT, 2.) 
EQ) 
— n n 

k=1 : 

l< 

n- 

k=1 

iT i 

a 2 n° 


Similarly, U(f, P,,) = 1/2 + 1/n?. We know, then, that 


1/2 —1/n? < Lif) < U(f) <1/24+1/n? 
for any n. It follows that L(/) = U(f) = 1/2. 
The argument of Example 4 can be generalized, giving us the following result. 


THEOREM 17.7: Suppose there exists a collection of partitions {P,,} with inf 
*b f nn’ = 

U(f, P,) = sup L(f, P,) = L Then Ja f@) 4 = 1. If furthermore, U(f, P,+1) < UG, 

P,,) for Lf Pai) 2 LF P,,)| for all n, then I= lim,_,,. Uf, P,,) [or I = lim,_,. 

L(f, P,,)]. = 


Theorem 17.7 is our first real indication that the integral might be computed 
using a limit (in certain circumstances, the limit of a sequence). Keep in mind 
that we are usually interested primarily in whether an integral exists and not so 
much in its value if it does. The Cauchy criterion has always been a powerful 
tool for obtaining information of this type. Here is a version of it for integrals. 


THEOREM 17.8: /: [a, b] — R is integrable if and only if, for any ¢ > 0, there 
is a partition P so that U(f, P)— L(f, P) <e. 


PROOF: This follows from Exercise 5.1.11 and Lemma 17.3. = 
EXERCISES 17.1 


el 95 
1. Show in detail that Jo #° @” = 3, 


2. Convince yourself that the containment in the definition of "refinement" goes 
the right way. 


3. We have assumed that the functions we are dealing with are bounded. Show 
that a function that is Riemann integrable must be bounded. 


4. If fis integrable on [a, b] and a < c < b, show that fis integrable on [a, c] 
and on [c, b] and that if f(x) dx + - f(z) de = 1 f(x) dx. 


5. (a) If f: [a, b] — R is integrable, show that the restriction of fto any interval 
c,d) © [a, 6] is integrable. 


(b) Let f: [0, 1] — R. If fis integrable on [e, 1] for every 0 < € < 1, is it 


necessarily true that f is integrable on [0, 1]? 
(c) Can a hypothesis be added in (b) to make the conclusion true? 
(d) If f: [a, b] — R is such that the restriction of fto any interval [c, d] that is 


properly contained in [a, b] is integrable, is f necessarily integrable? Can a 
condition be added to make it so? 


6. Draw a picture to illustrate the argument in Example 17.1.2. 
7. What theorem about the real line is used in Example 17.1.3? 
8. Show that the function 


r= 
is integrable on [0, 1]. 


9. The average value of a collection of numbers a, da, ..., a, 1s that number, a, 
so that if each term in a, + a, +... + a, 1s replaced by a, the sum remains the 
same. Explain why it is reasonable in this sense to define the average value 


of a function f: [a, b] > R to be i Io f(a) de, 
17.2 OSCILLATIONS 


The various conditions for integrability we have discussed each have their own 
conceptual advantages. The definition itself suggests the familiar integration 
process but requires evaluation of suprema, infima, or limits. Theorem 17.8 
doesn't seem to involve any limit process, but it's hard to see it as "integration." 
Since U(f, P) — L(f, P) = (Ms — mx)Arx, we might simply ask the latter sum to 
be small. This gives us only one sum to consider. The difference M;, — my, is 
called the oscillation of f over [x,_), x;,], denoted ose(f, [x;_;, x,]). With this 


terminology, the condition in Theorem 17.8 becomes: 


COROLLARY 17.9: The function f : [a, b] — R is integrable if and only if, for 
any € > 0, there is a partition P so that 


VD osc( f, [rp—1, 2x] Ar, <ée.§j 


EXAMPLES 17.2: 1. Here we show again that the function f(x) = x is integrable 


on [0, 1]. For any interval [x,_;, x;,], we have osc(f, [x,_, x;]) =x; — x;-1. Thus, 
for any partition P, 


>> ose(f, [re—1, rE] Ar, 
= Pe, Le — Te~1 PATE 
(maxj<pen{Le — Te-1}) i Ar, 


= mMaXi<k<n{Lk 4 og | } 


since 2. Ate = 1. Ife > 0 Tf ¢ > 0 is given, the condition of Corollary 17.9 is 
satisfied by any partition with max), {X, — Xp} < e. The quantity 
MAX | <p<, {Xz — Xz} is called the mesh of P, denoted p(P). 


We make a few observations about oscillations: 


LEMMA 17.10: (a) If f(x) — fy) < B for ™¥ © 5, then osc(f, S) < B. 

(b) If fis continuous and S is compact, there are points ®,¥ <5 so that osc(f, S) = 
Kx) — fly). 

(c) osc(f + g, S) < osc(f, S) + osc(g, S). 

(d) Ife e R, ose(cf, S) = |c\(osc(f, S)). 


PROOF: We will prove (a), leaving the rest as Exercise 17.2.2. We use the 
technique of Exercise 4.6.13. Let “ = SUPres{f(*)} and v =infres{f(“)}. Let e > 0 
be given. There are points 75¥ <5 so that f(x) + e/2 > u and f(y) — «/2 < v. Thus 
for any ¢ > 0, we have 


osc(f,S) =u-—v< f(x) -—fl(y)t+e< Bre. 
By Exercise 4.6.13, osc(f S)< B. = 


THEOREM 17.11: /ff: [a, 6] — Rand g: [a, b] — Rare integrable and c€ R, 
then f + g and cf are integrable and 


hb nb »b 
/ f(x) + g(x) dz = / f(z) dx + | g(x) dx 


and 


rh b 
| cf(x) dz = | f(x) dx 
Ja a 


PROOF: We will prove that f + g is integrable and leave cf and the formulas as 
Exercise 17.2.1 (proving such a formula is different from showing a 
Let ¢ > 0 be given ne let P be such that 2 ose(f, [x-1,0%])Are <¢/2 and 
D2 ose(g, [ate—1,tx])Aae < €/2, By Lemma 17.10, 


>» ose(f +9, [rx-1, 2x] ) Are 
S> ose( f, [r~-1, 24] Ary + 5° ose(g, [z~-1, 24] Ary 


and so f + g is integrable. = 


EXERCISES 17.2 
1. Prove Corollary 17.9. 
2. Complete the proof of Lemma 17.10. 
3. Prove Theorem 17.11. 
4. Suppose Iif— gil, < e on S. Show that 
osc(g, S$) — 2e < ose(f,$) < ose(g, S) + 2¢. 


5. (a) If f: [a, b] — R is Riemann integrable, show that the function / is 
integrable on [a, b]. (Hint: Consider the oscillation of f°. Recall that an 
integrable function is bounded.) 


(b) Using the identity (f + g)* =f + 2fe + g’, show that the product of two 
Riemann integrable functions is Riemann integrable. 


17.3 INTEGRABILITY OF CONTINUOUS FUNCTIONS 


We turn to the main theorem of the chapter. In abstract theories of integration 
one is not generally concerned with the value of a specific integral. The 
important questions are whether a function is integrable or not and whether the 
class of all integrable functions can be identified. We begin that process now. 


THEOREM 17.12: [ff: [a, b] — R is continuous, then f is Riemann integrable. 


PROOF: Since [a, 6] is compact, fis uniformly continuous (the intrusion of 


uniform continuity puts this proof "beyond the scope" of elementary calculus). 
Then given ¢ > 0, there is a 6 > 0 so that |f(x) — f(y)| < « whenever d — c < 6 and 
v.y © |e,d], According to Lemma 17.10, this means that osc(f; [c, d]) < ¢ for any 
such [c, d]. This is just what we need. Let e > 0 be given and 6 > 0 such that |x — 
y| < 6 implies |f(x) — fy)| < e/(6 — a). Choose a partition P = {xo, ..., x,} with 
u(P) < 6. Then 


a ose( ie [jt 15 0k )Ary, 
a ( h eran Ar; 
(b—a ) > Ag, 


since . Atk = b~ 4, Thus fis integrable by Corollary 17.9. = 


In Example 17.1.2 we saw that a discontinuous function can be integrable but 
that something as wildly discontinuous as the Dirichlet function is not. We will 
find how badly discontinuous a function can be and still be integrable in Chapter 
21, and for now will examine only one more result of this type. A monotone 
function can have many discontinuities (just how many it can have will be 
explored in Chapter 19). Nevertheless ... 


THEOREM 17.13: Jff: [a, b] — R is monotone, then fis Riemann integrable. 


PROOF: We will assume fis increasing. Let P be any partition of [a, b]. For 
each interval [x,_), x;], we have M, = f(x,) and m, = f(x;,_;). Thus 


>> ose(f, [z7~-1, Te] JAX, 
= Ls (te) — f(ve-1)) Are 
< pe(P) DF (ex) — f(@e-1)) 
= p(P)(f(b) — f(a)) 
and Corollary 17.9 will be satisfied by taking any partition P with u(P) < é/(f(d) 
— fia)). | 
EXERCISES 17.3 


1. Complete the proof of Lemma 17.3.a. 


2. (a) If f: [a, b] > R is continuous and f(x) => 0 for all = © 2.5), show that 
Ie f(a)dr > 0. 


(b) If f: [a, b] — R is continuous, f(x) > 0 for all * © [@:), and there is at least 


re 
one number c for which f(c) > 0, show that J, f(x) da > 0 


(c) Show that (a) remains true if it is assumed only that fis integrable. 


(d) Show that (b) does not remain true if it is assumed only that f is 
integrable. 


3. (a) Prove the Mean Value Theorem for Integrals: If f: [a, b] — R is 
continuous, there is a number ¢ € |[@;) so that 


“b 
| f(x) dx = f(c)(b—- a). 


(b) Describe this result geometrically. 


4. (a) Draw a picture and describe the geometric significance of the expression 
b-acx b-—a 
» 7 (« +k ) , 
nr n 
k=1 
(b) If fis continuous on [a, 5], show that 


7] mn / 
bh a itt b —@ 
(s)dz = im — ; +k ; 
if f(x) n—00 7 dJ (« z 


(c) Give an example of a nonintegrable function where the limit in (b) exists. 
(This is why we can't use this as the definition of the integral.) 


5. We say that f: [a, b] — Ris a step function if [a, b] can be decomposed into 
finitely many subintervals (some of which might consist of only one point) 
on each of which fis constant. 


(a) Show that a step function has a finite range. (Using this observation, we 
can define a step function for a domain that is bounded but not compact.) 


(b) Show that a step function whose domain is a closed, bounded interval is 
Riemann integrable. 


(c) If f: [a, b] — R is Riemann integrable and ¢ > 0 is given, show that there 


is a step function g such that Ia \f(x) - g(a)| dx < e 
(d) Iff: [a, b] — R is a step function and ¢ > 0 is given, show that there is a 
continuous function g such that Ia If (a) — g(a)| dx < é. 
(e) If f: [a, b] — R is Riemann integrable and ¢ > 0 is given, show that there 
is a continuous function g such that fo \F(@) —9(2)| de < €, 

6. A function f: [a, b] — R is piecewise linear if fis continuous and [a, b] can 


be decomposed into finitely many intervals on each of which fis a straight 
line. 


(a) Draw a graph of a piecewise linear function. 


(b) If f: [a, b] — R is continuous and «¢ > 0 is given, show that there is a 
piecewise linear function g such that Iif— gll, <e. 


(c) If f: [a b] — R 1s continuous and ¢ > 0 is given, show that there is a 
piecewise linear function g such that Ja |f(") — 9(@)| de < ©. 
(d) If f: [a, b] — R is Riemann integrable and ¢ > 0 is given, show there is a 


piecewise linear function g such that Ja f(@) ~9(#)| de < €. 


7. Suppose f: [a, b] — R is continuous and f(x) > 0 for all x. Show that 


.b L/P 
lim (| (f(x))? ic) = max{ f(r): 2 € [a, d}}. 
Pat x v7 a 


8. Show that a decreasing function is Riemann integrable. 


9. Suppose f: [a, b] — R is such that [a, b] can be decomposed into a finite 
collection of intervals, on each of which f is monotone. Show that f is 
integrable. 


17.4 THE FUNDAMENTAL THEOREMS 


It is not our purpose to review the computational techniques of calculus. On the 
other hand, the results called the "Fundamental theorems" should not be left 
unmentioned (and their proofs make use of the Big Theorem). Other 
computational devices, so important in elementary calculus, aren't as close to the 
structure of the real numbers, and we won't discuss them here. 


THEOREM 17.14: (The Fundamental Theorem of Calculus) [f the function f : 
[a, b] — R is continuous and F : [a, b] > Ris such that F(x) = f(x) for © © (4,9), 
then i f(z) dx = F(b) - F(a). 


PROOF: Since fis continuous, it is integrable, say with integral J. Let e > 0 be 
given. We will show that |F(b) — F(a) — I| < e. Let P= {xo, ..., x, } be a partition 
with U(f, P) — L(f, P) < e. We write F(b) — F(a) in the following way (remember 
a =x, and b = x,): 


F'(b) — F(a) 
= F(2,) — F(ty-1) + F(tn-1) — °°: — F(21) + F(21) — F(20).- 


By the Mean Value theorem, there is, for each k, a number Sk € (k~15 7k] go that 
Fy) — FOr) = FDO -— Xe = NWO — 1). Thus 
F(b) — F(a) = 2 F(Ek)Ate Now both Lf P) = LU f(E)Ate SUC, P) and Lf, P) < 
I < U(f, P). It follows that (9) — #(@) -1| = > f(&x)Are—1| <e (by Theorem 
4.19). Since this is true for any ¢ > 0, we have F(b) — F(a) =/. = 


The Fundamental theorem tells us, in a sense, what happens if we "integrate a 
derivative." The next theorem, called the Second Fundamental Theorem 
(sometimes the billing is reversed), tells us what happens if we "differentiate an 
integral." First we need a lemma, itself a result of some importance. 


LEMMA 17.15: [ff: [a, b] — R is integrable, then |f| is integrable and 


| “b b 
| f(a) dz| < [ f(x)| dz. 


PROOF: To establish that |f| is integrable we must first estimate its oscillation. 
By the Triangle inequality, for any 7+¥ € [2«~1,7«], 


lIF(x)| - IF(y)I| < IF (@) — F(y)| < ose(f, fares, 24). 


It follows by Lemma 17.10 that osc(|f{, [x;-1, x;]) < osc(f, [x;_1, x;,]). Let ¢ > 0 be 
given and P be a partition with © 08¢(/; [vx—-1,v«])Avk < ©, Then 


> ose(|f], [ze-1,2%]) Aa, 
< 3 osc( f, [ry 1, 2%) Az, 
=) %6 


and |f| is integrable by Corollary 17.9. By the Triangle inequality again, |U(f P)| 
< U(|f, P) and |L(f P)| < Ld, P) for any partition P. The rest of the lemma 
follows. = 


THEOREM 17.16: (The Second Fundamental Theorem) Suppose that f: [a, b] 
— R is integrable and F : [a, b] — R is defined by 


F(z) = i f(t) dt. 


Then 
(a) F(x) is uniformly continuous. 
(b) If fis continuous at a point c, then F is differentiable at c and F'(c) = fic). 


PROOF: (a) We have defined the Riemann integral only for bounded functions, 
though it might not be clear that this is a necessary restriction. You showed in 
Exercise 17.1.3 that an integrable function must be bounded. Suppose |f(x)| < B 
for all © € [2,4] then for 7 y € (a, 6], 


\F(y) — F(z)| 


[sae = [ f(t) dt 


y 
[seat (by Exercise 17.1.4) 


A 


y 
< [rola (by Lemma 17.15) 
B(\y — 2}). 


lA 


It follows that F' is uniformly continuous. 
(b) Now suppose fis continuous at c. To show that F'(c) = f(c), we must examine 
F(x) — F(c) — f(c)(x — c). Inserting the definition of F, using Exercise 17.1.4, and 
noting that f(e)(" —°) = Jc flo) dt, we have 


F(z) — F(c) — f(e)(2 —c) 


= j f(t) dt / f(t) dt ff f(c) dt 


= [ f(t) — fle) dt 


Let 7(s) = sup|f{(c + s) — f(c)|. Then n(s) — 0 as s — c¢ (since fis continuous at c). 
Then | fr f(t) — fle) dt| < n(x — c)(x — cc) = o(x - c) and so F(c) =fic). = 


The proof of the (first) Fundamental theorem is quite reminiscent of the 
approach to integration that is familiar from calculus. It seems that the proof 
would have been a bit easier if we could have dealt with the points ¢, directly, 


without having to make the estimate in the second from last line. This approach 
to the problem is what is usually called "Riemann integration." 


DEFINITION 17.17: (a) A collection of points = {¢), ..., ¢,} 1s called a set of 
intermediate points to the partition P = {xo, ..., x,} if x, < & <x, for k = 1, 
eng lhe 

(b) The Riemann sum for /: [a, b] — R over the partition P with intermediate 
points = is RUF, P,=) = Vpn f(x) Are, 


aw 


ib. See 

THEOREM 17.18: Ja /(2@) ¢2 = / if and only if given € > 0 there is a partition P 
so that if P' is any refinement of P and = is any collection of intermediate points 
of P', then |R(f, P’, =) -I| <e. 


PROOF: The "only if" part has been done in the proof of the Fundamental 
theorem. Suppose the condition of the theorem holds and let P be such that |R(f 
P', &) —I| < e/3 for any refinement of P. Then U(f P’) —I<é/3 andJ—-L(if P) < 
é/3, and so U(f, P)-L(f P')<e. = 


EXERCISES 17.4 


1. (a) Show that "the rest of the lemma follows" as claimed in the proof of 
Lemma 17.15. 


(b) If fis integrable and g is continuous, show that g . fis integrable. 
(c) Show that Lemma 17.15 follows from this. 


2. Let F be defined as in the Second Fundamental theorem: 


F(z) = | f(t) dt. 


(a) If fis positive, show that F'is increasing. 

(b) If fis increasing, show that F' is convex (see Exercise 16.7.6). 

(c) The observation in (b) is often used to construct convex functions with 
specified properties. For instance, show that if fx) — 0 as x — a’, then F(x) 
=0(x) asx a’. 

(d) State and prove a condition on f similar to the one in (c) that will 
guarantee that F(x) = 0(x7). 


3. (a) Show that S(x) = Jo 4t is differentiable at x = 0 but does not have a 
second derivative at x = 0. 


(b) Construct a function g such that g(a), g(a), ..., g(a) all exist for some 
point a, but g"* (a) does not exist. 


iB: peek ies 
4. If fand g are continuous functions such that Ja /(*) @ Ja 9%) 4, show that 
there must be a number © € (@:) with f(e) = gle), 


5. Show how the "only if" part of Theorem 17.18 "... has been done in the 
proof of the Fundamental theorem." 


6. Here is an outline of a proof of the Mean Value theorem. Suppose that f(x) 
and g(x) are functions satisfying the hypotheses of the theorem and that g(a) 


= fia). Let S= [f(b) — f(a)]/(b -— a). 


(a) Suppose that g(x) > S for all © © (4,0). Use the Second Fundamental 
theorem to show that g(b) > f(b). 
(b) Suppose that g(x) < S for all “© (@,5), Use the Second Fundamental 


theorem to show that g(b) < f(b). 


(c) Show that unless f(x) = S for all x, 1t must be the case that there is a point 
a € (a,b) with f(a) > S and a point f € (a, b) with f(f) < S. 


(d) Use Exercise 12.6.2 to show that there is a c € (a, b) with f(c) = 0. 


(e) Discuss whether this proof is valid. 


17.5 RIEMANN-STIELTJES INTEGRATION 


Integration as seen in calculus is mainly about areas under curves. We realize as 
we go along that this is not the essence of integration (though it is a very useful 
application). If we aren't tied to this geometric problem, we might ask whether 
the process we have described is the only one that can reasonably be called 
"integration." You have gathered from the title of the section that it is not. The 
Riemann integral is an averaging process of sorts, in which values of a function 
are weighted with the lengths of the subintervals in a partition and added. We 
will alter the process by using a second function to determine the weights 
associated with the subintervals. 


DEFINITION 17.19: (a) If f: [a, b] — R and g: [a, b] — R are bounded, P is a 
partition of [a, b], and € is a collection of intermediate points of P, the Riemann- 
Stieltjes sum of f with respect to g, P, and = is 


R(f,9,P,=) = 3 F(€)(g(rx) — g(re-1)). 
&=1 


(b) fis Riemann-Stieltjes integrable with respect to g with integral / if, given ¢ 
> 0, there is a partition P so that |R(f g, P’, &) — J| < e whenever P’ is a 
refinement of P and for any collection of intermediate points. If this is so, we 


write 
.b 
/ fdg=I. 


The function g is called the integrator in this expression. 


We have adopted a definition of the Riemann-Stieltjes integral based on 
"intermediate points" because it will make an important theorem easier to prove 
later on. We can still consider "upper and lower sums," though. The proof of the 
following theorem will be left as Exercise 17.5.2. We write Ag; = 9(x;) — g(x;_)). 


THEOREM 17.20: Jf g is increasing, f is Riemann-Stieltjes integrable with 
respect to g if and only if, given ¢ > 0, there is a partition P so that 


U(f.g,P) -— L(f.g,P) = >> Mr Age — S> me Age < ¢, 


where U, L, M,, and m, are defined as before. = 


We have an analogue of Corollary 17.9: 


COROLLARY 17.21: If fis bounded and g is increasing, then f is Riemann- 
Stieltjes integrable with respect to g if and only if, for ¢ > 0, there is a partition P 
= {Xo, X1, ---,X,} So that 


>= ose(f, [re-1, 2%] )Ag, < ¢. 8 


EXAMPLES 17.5: 1. If g(x) = x, the Riemann-Stieltjes integral Jc /49 is the 
. a 
same as the Riemann integral J, /(") 4", 


2. Suppose f: [-1, 1] — R is continuous at 0. Let g(x) = 0 if x < 0 and g(x) = 1 if 
x > 0. Then g(x,) — g(x,_;) = 0 unless 0 € |**-1,«), and if this is so, g(x,) — g(x; 


1) = 1. Then U(f,g,P) = SUPre[z4_1,7%) f(z) and L(f,g,P) = inf rele, 1:Tk) f(x), Since 
fis continuous at 0, both these values can be made close to f(0) by making x, — 


x,-, small. It follows that fry f.dg = § (0), 


3. Let g(x) be as in Example 2. We will show that g is not Riemann-Stieltjes 
integrable with respect to itself.! Let P be a partition that contains 0. Suppose x, 
= 0. Note that osc(g, [x;-1, xj]) = 0 and Ag; = 0, except for 7 =k + 1, and osc(g, 
[xp x, + 1]) = Agys, = 1. Thus © 08¢(g, [te-1,7«])Age = 1, and the result follows 
from Corollary 17.21. 


Since the condition of Corollary 17.21 is much easier to check than the 
definition, we will prove most of the theorems that follow for increasing 
integrators. In the end we will see that this restriction does not hamper us much. 


EXERCISES 17.5 


0) 
TT 


1. Let f(x) =x = sin(x). Evaluate J-ajof a9 


2. Prove Theorem 17.20. 


3. (a) If f: [a, b] > R is continuous, © © (@:"), and g is the function 


evaluate Ja f 49, 
(b) If f: [a, b] — R is continuous and “1;-- >>: rn € |a,) construct a Riemann- 
Stieltjes integral whose value is f(x,) + ... + flx,). 


(c) Suppose (x,,) is an increasing sequence in [a, b] with lim x, = b and 1/0» 
is a convergent, positive series. Construct a Riemann-Stieltjes integral whose 
value is 22 @nf(£n), 


(d) Suppose (x,,) 1s a sequence contained in [a, b]. Is there necessarily a 
Riemann-Stieltjes integral whose value is © f(%n)? 


4. If fis Riemann-Stieltjes integrable with respect to the differentiable function 


g, show that 
b b 
| fdg= | f(x)g' (x) dx. 
7a Ja 


(Note that the last expression is a Riemann integral.) 
17.6 RIEMANN-STIELTJES INTEGRABLE FUNCTIONS 


THEOREM 17.22: /f f : [a, b] — R is continuous and g : [a, b] — R is 
increasing, then fis Riemann-Stieltjes integrable with respect to g. 


PROOF: Since f is uniformly continuous, osc(f, [x, y]) can be made small 
uniformly (that is, everywhere in the interval at once) by making y — x small. Let 
é > 0 be given and let 6 > 0 be such that osc(f, [x, y]) < e/(g(b) — g(a)) whenever 
y—x <0. IfP is any partition with w(P) < 6, we have 


y~ ose(f, [a,-1, rE] Age 


— Agp. 
g(b) Ope . 


since ) Agx = 9(b) — g(a). 


All of our arguments can be adjusted to hold for decreasing integrators. The 


following theorem, whose proof is Exercise 17.6.1, can be combined with this 
observation to open important avenues in the theory. 


THEOREM 17.23: Jf f is integrable with respect to both g and h, then f is 
integrable with respect to g + h and 


~h *b ob 
/ fd(ig+th)= | fdg+ / fdh.f 


A continuous function is thus integrable with respect to any function that can be 
written as a sum of an increasing function and a decreasing function. Such 
functions are said to be of bounded variation. 


COROLLARY 17.24: Jf f: [a, b] — R is continuous and g : [a, b] — R is of 
bounded variation, then f is integrable with respect to g. = 


You will examine functions of bounded variation in Exercise 17.7.6. 


EXERCISES 17.6 
1. Prove Theorem 17.23. 


2. (a) Suppose g is increasing and fis bounded. If fis continuous at all points 
where g is discontinuous, show that fis Riemann-Stieltjes integrable with 
respect to g. 


(b) Let f(x) = 0 for x < 0 and f(x) = 1 for x > 0 and let g(x) = 0 for x < 0 and 
g(x) = 1 for x = 0. Show that fis integrable with respect to g but that neither / 
nor g is integrable with respect to itself. 

(c) Show that h is integrable with respect to f [in part (b)] if and only if h is 
continuous from the right at 0. 

(d) Show that h is integrable with respect to g [in part (b)] if and only if h is 
continuous from the left at 0. 

(e) If g is increasing, show that fis integrable with respect to g if and only if 
there are no points where f and g have a common one-sided discontinuity 
(that is, where they are discontinuous from the same side). 


17.7 INTEGRATION BY PARTS 


Corollary 17.24 goes a long way toward settling one of the most important 
questions of Riemann-Stieltjes integration, but it doesn't tell the whole story. 
There are discontinuous functions and functions that are not of bounded 
variation that can be integrated. Integration by parts, a familiar subject from 
calculus, opens more possibilities. This may have seemed a purely symbolic 
exercise in calculus, but it is considerably deeper and speaks to the subtle 
interplay of integrator and integrand in a Riemann-Stieltjes integral. The proof of 
this theorem is technical (it might be a good idea to draw a picture as you go 
along) but the formula in the conclusion of the theorem should be familiar. 


THEOREM 17.25: If f is integrable with respect to g, then g is integrable with 
respect to f and 


eb b 
| f dg = f(b)g(b) — f(a)g(a) - | g df. 


PROOF: Let P = {xo, ..., x,} be a partition of [a, b] such that if P’ is any 
refinement of P and 2 = {¢), ..., ¢,} is a collection of intermediate points, |R(f g, 
P', B)— I) <e. Let P’ = 1Yo,--++ Yan} = PU, Then P’ is also a partition of [a, b], 
with y>, = x, and y,, = ¢, and P’ is a refinement of P. Consider the sum 
d= 9(S)4 Fe, By adding and subtracting the terms f(y9)g(¥o), M2) g(V0)s «+ 
rearranging, and noting that a = yy and b = yp, we see that 


Te 


S- HEWIAF: 


k=1 


n 


f(b)9(b) — F(a)g(a) — © F(Ge)(9(ye) — 9(ye-1)) 
k=1 


= f(b)g(b) — f(a)g(a) — R(f,g, P’, Z), 


where Z = {¢, ..., &,} and ¢; 1s always one of the original partition points x,. 
The last sum is a Riemann-Stieltjes sum for f with respect to g over a refinement 
of P, and so is within é of La 49° and the result follows. = 


Combining Corollary 17.24 and Theorem 17.25, we have: 


COROLLARY 17.26: [ff is of bounded variation on [a, b] and g is continuous, 
then f is Riemann-Stieltjes integrable on (a, b] with respect to g. = 


Theorem 17.25 essentially doubles the collection of Riemann-Stieltjes integrals, 
but we have only scratched the surface of this subject. A detailed study would 
fill another book. 


EXERCISES 17.7 


1. 


. If a probability experiment yields numerical results 7,, 7, ..., 7 


Draw a picture to describe the construction of the partition P’ in the proof of 
Theorem 17.25. 


. Write out the manipulations of the sum »:~1 9{6%)4/* described in the proof 


of Theorem 17.25. 


. (a) Let g(x) = sin x. Evaluate [p' xdg. 


eb 
(b) Find a general formula for simplifying an integral Ja % 49. What properties 
must the function g have to make your formula valid? 


. If a mass m is placed on the x-axis at the point x > 0, its moment about the 


origin is mx. (You learned about moments as a child: A seesaw will balance 
if the moments of the masses on either end are the same.) If several masses, 
M, Mp», ..., mM, are placed at x), x5, ..., x,, the moment of the system is 


euk=1'"k"® Suppose we have a wire of variable density, placed along the x- 


axis between x = a > 0 and x = J, and a function g(x) = [the mass of the wire 
“b 
to the left of x]. Explain why” the moment of the wire is Ja * “9. 


a» with 
associated probabilities p,, p>, ..., p,, (that is, the probability you get 7; is p;), 
the expected value of the experiment is 2:=1?«"'. Now suppose that the 
possible results of the experiment include all numbers between a and b. Then 
we would have, instead of a finite collection of probabilities, a cumulative 
distribution function p(x), where p(x) = [the probability that the result is < 
x]. 

(a) Explain why p(x) = 0 if x < a and p(x) = 1 ifx=>b. 

(b) Explain why p(x) is increasing. 

(c) Show that, if the probability of getting the individual result r is 0, then 
p(x) 1s continuous at r. 


(d) Explain why p(x) is "right-continuous," that is, ims—-+ p(©) = P\"), but if 
the probability of getting 7 is not 0, then p(x) is not left-continuous at r. 


(e) Describe an experiment where the probability of some specific number is 
not 0 (in some contexts such a number is called an atom). 


(f) Show that an experiment like this can have only countably many atoms. 
(See Exercise 13.2.4.) 


“b 
(g) Explain why the expected value of such an experiment is Jo 7. 


. In this project we describe the functions of bounded variation in a different 
way. Consider functions defined on an interval [a, 5]. A collection of closed 
intervals {[x,, y,]}, each contained in [a, b], is said to be nonoverlapping if 
the open intervals {(x,, y,)} are disjoint (the closed intervals may intersect 
only at endpoints if at all). The subintervals in a partition, for instance, are 
nonoverlapping, but a collection of intervals need not come from a partition, 
and need not be finite, to be nonoverlapping. We say f is of bounded 
variation on [a, b] if sup | /(n) — f(@n)| is finite, the supremum taken over 
all nonoverlapping collections of subintervals of [a, b]. This supremum is 
called the variation of f over [a, b], denoted V(f a, b). 


(a) Show that a bounded, monotone function is of bounded variation over any 
interval and compute its variation. 


(b) Suppose f is such that [a, b] can be written as a finite collection of 
intervals on each of which fis monotone. Show that fis of bounded variation 
and describe how to find the variation of f. 


(c) Show that the function given by f(x) = x sin(1/x) if x # 0 and f(0) = 0 is 


not of bounded variation, but g(x) = x* sin(1/x) if x 4 0 and g(0) = 0 is of 
bounded variation. 


(d) Show that the sum of two functions of bounded variation is of bounded 
variation and that any constant multiple of a function of bounded variation is 
of bounded variation. 


(e) Show that it is possible for a collection of nonoverlapping intervals 
(contained in a bounded interval) to be infinite. Show that fis of bounded 
variation if and only if the supremum of the sums in the definition is finite 
when taken only over finite collections of intervals. 


(f) Show that fis of bounded variation if and only if each sum used in the 
definition has a finite value, whether the sum has finitely or infinitely many 


terms. 

(g) The function Ve(f, a,b) —— sup )_ (f(bn) — f(an)) is called the positive 
variation of f, and Vv(/-4.>) = inf >/ (fbn) ~ f(a@n)) is called the negative 
variation of / (the supremum and infimum being taken over all 
nonoverlapping collections). Show that Vif. a, b) = Vp(f, a, b) — Vif. a, 5). 


(h) If «© (¢.5| we define the variation function, the positive variation 
function, and the negative variation function by v(f x) = Vf a, x), vp(f, x) 
= Vf, a, x), and vf x) = Vi(f a, b), respectively. Show that vp is 
increasing and Vy, is decreasing. 


(i) If fis of bounded variation, show that v(f x) = vp(f x) — vatf x) and f(x) = 
vp(f, X) + vf x). 
(j) Show that fis of bounded variation if and only if it can be written as the 


sum of an increasing function and a decreasing function (the definition given 
here is equivalent to the one given in the chapter). 

(k) Let ¥ © |@,), Suppose fhas the property that there are numbers c < d and 
a decreasing sequence (a,,) that converges to y, with f(a,,,) < c and f(a>,.1) > 
d. Draw a picture that describes this situation. Show that fis not of bounded 
variation. (The significance of this is discussed in Chapter 19.) 


(1) Suppose fis not of bounded variation on [a, b]. Show that there is a point 


c € |a,)] so that fis not of bounded variation on the set |¢- ©.¢+ <2) {@,5) for 
any ¢ > 0. (Hint: Think about the proof of the Bolzano-Weierstrass theorem.) 


. (a) Suppose fis defined on some open interval containing the point x. Show 
that fis continuous at x if and only if, for any ¢ > 0, there is a 6 > 0 so that 
osc(f; [a, B]) < ¢ whenever * © [@.9) and B- a <0. 

(b) Suppose f is defined on [a, b]. Let {[a,, £,]} be a nonoverlapping 
collection of intervals in [a, b]. We say fis absolutely continuous on [a, 5] 
if, for any ¢ > 0, there is a 6 > 0 so that L ose(/, @n,9n]) < © whenever 
L(3n — Om) < 6, Show that an absolutely continuous function is continuous. 
(c) Find a continuous function that is not absolutely continuous. 

(d) Show that an absolutely continuous function is of bounded variation. 


(e) If fis differentiable and |f(x)| < B for all * © |@-4), show that the length of 
the interval f([a, b]) is not more than B(b — a). (You should begin by saying 


how we know that f([a, b]) is an interval.) 
(f) If fhas a bounded derivative, show that fis absolutely continuous. 


(g) Show that the function F(x) defined in the Second Fundamental theorem 
is absolutely continuous. (Some of the significance of absolute continuity is 
explored in Chapter 19. An absolutely continuous function is one that can be 
recovered by integrating its derivative.) 


8. Here is another way to describe the integral of a function. Suppose /: [a, 5] 
— R is an increasing function. 


(a) Show that, if a <y,; <y. <b, then {x : y, <f(x) <y>} is an interval. 


(b) Suppose the range of fis [c, d]. Let P = {c = yo, yj, .-.. ¥, = d} be a 


S 


partition of [c, d], and construct the sum »4=1 Y¥«/«, where L, is the length of 


the interval {x : y,_; < f(x) <y;}. Draw a picture describing this. 


(c) Let /(") = Vv“ and [a, b] = [0, 1] (so that [c, d] is also [0, 1]), and suppose 
the points of P are equally spaced. Compute the sum in (b) for n = 4. 


“a 
(d) Find the limit as n — oo of the sums in (c). Compare this with Jo Vtde, 


(e) If fis any increasing function, show that the limit of the sums described in 


(by is J fn) dr, 

(f) Discuss how this process would have to be modified to allow for 
functions that are not increasing. (You will find, eventually, that you will be 
thinking about the material in Section 21.4. This is the very tiniest beginning 
of what is called Lebesgue integration.) 


! That this can happen for such a simple function has led many people to consider alternative definition 
of the Riemann-Stieltjes integral 


2 In the next two exercises, you are asked to "explain" some things. You should convince yourself that 
the statements are true with whatever degree of precision you wish, but proofs are beyond the scope of the 
book 


Chapter 18 


Interchanging Limit Processes 


18.1 ARECURRING PROBLEM 


The deepest theorems in calculus describe what happens when we carry out 
multiple limiting processes in different orders. Our naive hope is that the result 
of such a process should not depend on the order in which the individual limits 
are computed, but even in the simplest situations this is not the case. 
Differentiation, integration, convergence of sequences and series, and 
determination of continuity are all limiting processes, and so we have dealt with 
this unavoidable issue in Theorems 15.4, 15.5, 17.14, and 17.16. Observe that 
the first two of these theorems tell us, in essence, that the results of the multiple 
limits involved are independent of the order in which the limits are computed, 
while the latter two say something different. In this chapter, we look at more 
examples of this problem. We will begin with a very direct manifestation of it. 


18.2 DOUBLE SEQUENCES 


A double sequence, denoted (a,,,,), 1s a real-valued function whose domain is 
the Cartesian product N x N. We wish to "let m and n go to infinity," but there 
are many ways this can be said to occur, as illustrated in the first example. ! 


a,, = 0 


mn 


EXAMPLES 18.2: 1. Let a,,,, = mm + n). Notice that lim 
regardless of the value of m, while lim. 


m—oo 


n—0 


Ann = 1 regardless of the value of n. 


So lim, _,.,(lim, 0. There are even 


mw Inn) = 1 while lim,,_,.. (lim 


n—oo Amn) — 
more possible "limits" for this double sequence. If we consider only those terms 
where m = n, we have lim,,_,.. Gm = 1/2. 


If we always pick n = 2m, then we have lim,,,_,.. @),2., = 1/3. We can make this 


limit seem to be any number from 0 to 1 by fixing an appropriate relationship 
between m and n. 


2. Let b,,,, = 1m? + n’). Then b,,,, > 0 as either m — © or n — o (each 


regardless of the other index). Furthermore, if ¢ > 0 is given, then b,,,, < ¢ 


whenever m? + n? > 1/e. 


In any of the processes described in the first example, m and n can be said to "go 
to infinity." Which of these values is the limit of the sequence? None of them. 
An ordinary sequence can have only one limit. This property should probably be 
preserved however we define the limit of a double sequence. The first two 
processes in Example | are called iterated limits. Iterated limits are easy to 
compute, since they involve nothing more than two ordinary limits (and no need 
to assume a relationship between m and n). But the iterated limits of a double 
sequence can be different, which doesn't help us define the "limit" of a double 
sequence. 

In Example 2, we see that we can consider the behavior of a sequence as m 
and n get large either independently or together. The requirement that m? + n? is 
large says that the point (m, n) is "far away from the origin" in the sense that it 
lies outside some large circle. It is a bit easier to require that m and n be outside a 
large square, and this is how we make our definition. You will show in Exercise 
18.2.1 that the results are the same. 


DEFINITION 18.1: The double sequence (a,,,.) has limit Z if, for any ¢ > 0, 
there is a natural number N so that |a,,,, — L| < ¢ whenever m, n > N. If this is the 


case, we write lim a i 


m,n—>o “mn 


Iterated limits seem easier to compute than double limits since they are 
essentially just ordinary limits. Our goal should be to see how the limit of a 
double sequence is related to its associated iterated limits. In the first example 
we saw that it is possible for both iterated limits to exist while the double limit 
fails to exist. The existence of a double limit does not even guarantee the 
existence of the iterated limits, as seen in the next example. 


EXAMPLES 18.2: 3. Let c,,, = (-1)"*"(1/m + I/n). If m, n > N, we have 


lCmn — 0| 


= 1/m+I1/n 


which can be made as small as we like. Thus the double limit of this sequence is 


0 even though neither iterated limit exists! 


Under what conditions, if any, does the convergence of the double sequence 
imply the existence of the iterated limits? Under what conditions can the double 
limit be found by evaluating the iterated limits? In the last example, the iterated 
limits didn't exist because it was not possible to compute the "inside" limits. The 
next theorem shows that this is the only way this problem can occur. 


THEOREM 18.2: Suppose that the double limit lim exists and the 


m,n—co a 


mn 
limit lim 


n—oo 


Amn exists for each m. Then the iterated limit lim,,_,.. WM). Ann 
exists and is equal to the double limit. A similar statement holds for the other 


iterated limit. 


PROOF: Let a = lim 
given, there is a natural number N so that |a,,,, — a| < « when m, n > N. This is 


Ann and a,, = lim Any, for each m. If ¢ > 0 is 


m,n n—00 


true for all n, and the function f(x) = |x — a| is continuous. Thus 


lam — al 
= |(limp—oo @mn) — a| 


= limn—~co l@mn — 2 


for m > N. Thus lim A» = a, as desired. = 


m—co 


COROLLARY 18.3: /f the double limit lim 
Amn exists for each m, and the limit lim,,,.5 Gy exists for each n, then the 


mn—o mn exists, the limit lim, _,.6 


iterated limits both exist and are the same. = 


Getting from the iterated limits back to the double limit will not be so easy. We 
have seen that it is possible for both iterated limits to exist while the double limit 


does not. Consider d,,,,, = (mn)/(m* + n?). The iterated limits are both 0, while 
d,,, = 1/2, and so the double limit can't exist. Thus the double limit can 
fail to exist even when the iterated limits are the same. In this example, lim,,_,,, 


lim, 50 
dj», = 9 for any m, but the way d,,,, approaches the limit depends on the value of 
m. Everything falls into place when we eliminate this dependence. 


DEFINITION 18.4: Suppose lim mn = 4m tor each m. This convergence is 


n—o a 


said to be uniform in m if, for each ¢ > 0, there is a natural number N, (which 
does not depend on m) so that |a,,,, — a,,| < ¢ when n > N,. The phrase uniform 
in 7 is defined similarly. 


Each time we have introduced an idea of "uniformity," we have first proved a 
theorem to the effect that the if a "double" process is nice enough, the two 
"single" processes are uniform in some way. This time is no different. 


THEOREM 18.5: Jf the double limit lim 
Amn exists for each m, then the convergence of the latter is uniform in m. 


mn—0 Inn exists and the limit lim,_,-. 


PROOF: Let ¢ > 0 be given. Let lim = L and lim 


each m. Since the double limit exists, there is a natural number N, so that |a,,,. — 


mn Amn n—-o Amn = Am for 


L| < e/2 whenever m, n > N,. In the proof of Theorem 18.2, we saw that |a,,,, — L| 
< ¢/2 whenever m > N,. Thus if m, n > N,, we have 


— — Am | 


lQmnn L t lL Om| 


We must show that this condition does not depend on m. We have shown that the 
condition holds for all m > N,, and so it could "depend on m" only if it failed for 


one of m = 1, 2, ..., N; — 1. For each of these, there is a number N,,, so that |a,,,, — 
a,,| < € whenever n > N,,. Let No = max{N,,,}. Then, for m= 1, 2, ..., N; — 1, we 
have |.) — G,| < € whenever n > N>. Finally, let N = max{N,, N>}. Ifn > N, we 
have |a,,, — @,| < é for all m. = 


THEOREM 18.6: Jf lim A», exists for each m and lim Any, exists for 


no “mn m—o “mn 
each n, and if the convergence of one of these is uniform in the other index, then 


the double limit Lim) p25 Amn Cxtsts and all three limits are equal. 


PROOF: Let lim 
convergence of a,,,. to x,, 18 uniform in m. Let ¢ > 0 be given. There is a natural 


noo mn = Xm and lim,,,.6 Ann = ¥, and suppose that the 


number WN so that |a,,,, — x,,| < 6/3 whenever n > N and for all m. We want to 
show that (x,,) converges. Now a,,, — y,, and so (a,,,) 18 a Cauchy sequence 
(with index m). If 7 and k are large enough (and j, k > N), 


jx; — Ip} 
Lj — Ajn| + |ajn — Gkn| + |Qen — Fk 


€/3+e/3+ €/3. 


Thus (x,,) is a Cauchy sequence and so it converges. We have shown that the 
iterated limit lim,,_,., lim,_,.. Gn, €Xists. Now we show the double limit exists. 
(The rest of the theorem follows from Theorem 18.2.) Let x = limx,,. There is a 
number & so that |x, — x| < e/3 whenever m > M. With N as before, we have |a,,,, 


= | <|Ayn — Xml + py — x] < € whenever m, n> max{M, N}. = 


EXERCISES 18.2 


1. (a) Show that the definition of convergence of a double sequence is 
equivalent to saying c,,,, 1s close to L if m and n are "outside a big enough 


circle." 
(b) Show that the definition of convergence of a double sequence is also 
equivalent to saying c,,, 1s close to L if m and n are "outside a big enough 


rectangle," in the sense that lim,, ,_,.. Cj, = L if and only if, for any ¢ > 0, 


there are numbers M and N so that |c,,,, — L| < ¢ whenever m > M andn > N. 
(c) Show that the definition of convergence of a double sequence is not 
equivalent to saying c,,,, 1s close to L if m and n are "outside a wide enough 


rectangle," where we might say a rectangle is "wide" if the larger of its two 
dimensions is big. 


2. Verify that the double limit in the first example in the chapter fails to exist. 


3. Let Cy = (-1)"*"(1/m + 1/n). Show that neither iterated limit exists. 


4. (a) Let c,,, = (mn)(m? + n7). Show that both iterated limits are 0 but that the 
double limit doesn't exist. 


(b) Describe how lim depends on m. 


nO Cmn 


5. Show that the limit of a sum of double sequences is the sum of the limits. 
State and prove other simple algebraic formulas. 


18.3 INTEGRALS AND SEQUENCES 


The next two examples of interchange of limits have a more direct bearing on 
calculus and help complete the discussion of power series and Taylor series. 


EXAMPLES 18.3: 1. Let f,(x) be defined by the following graph. Notice that f, 
is continuous (hence integrable) for all and the integral of f, is 1. Now f(x) = 
lim,, 500 f,(x) = 0 for all * € [9,1] (be sure you see that this is so). Consequently 


the integral of fis 0, and the "limit of the integrals" is not the same as the 
"integral of the limit." 


We will make quick work of this problem. 


THEOREM 18.7: /f(f,) is a sequence of integrable functions with domain [a, b] 
and f,, — f uniformly, then f is integrable on [a, b] and 


b b ob 
/ lim f,(a2) dz = ; f(x)dx = lim i fnlz} az. 
a v7 *OO a 71 *OoO Ja 


PROOF: That fis integrable follows from Exercise 17.2.4. To establish the limit 
formulas, we will prove the special case where f= 0 (so that the limit on the right 
should be 0). Note that 


b } b 
[ fn(x) dx} < | lfn(x)| dx < || fnlloo(b — a). 


The first inequality is from Lemma 17.15. The right side goes to 0 as Ilf|l,, goes 
to 0, and the result follows. = 


EXERCISES 18.3 


1. Let f(x) be the function described in the sketch in this section. Show that 
lim,,_50f,(%) = 0 for all t € (0, 1] 


2. Verify the claim in the proof of Theorem 18.7 that"... fis integrable follows 
from Exercise 17.2.4." 


3. Define a sequence of functions f,, as follows: Let 7), 75, ... be an enumeration 
of the rational numbers. Let 


fn(z) = { Ll #2E€ {ri;7y,-:-4Tn} 


0 otherwise. 
(a) Show that f, 1s integrable on any closed, bounded interval for all n. 


(b) Show that lim f, = D(x), the Dirichlet function, so that lim f, is not 
integrable over any interval. 


4. (a) Show that lim, ee J ¢ —nz* dr _ 0 
1 "1 _p-2 ‘i 
(b) Does the argument you used in (a) apply to "x2 Jo ¢ dx'g 
(c) Show that iT & e-n2" de = 0 


18.4 DERIVATIVES AND SEQUENCES 


Derivatives are more complicated than integrals. It may happen that the 
sequence (f’,) converges uniformly while (f,) diverges (take f(x) = n, for 
instance). We will see in Chapter 20 that even uniform convergence of a 
sequence of differentiable functions does not guarantee differentiability of the 
limit, much less give a means of finding its derivative. This is not particularly 
surprising. Functions that are uniformly close together can have derivatives that 
differ greatly. These difficulties are, however, cleared up in a way that is not 
surprising. We have to be careful to avoid the situation described above, and 


presently we will make the proof very easy by adding what is really an 
unnecessary hypothesis (continuity of the functions /’,,). This theorem is, then, 


only a special case, but one that includes power series. You might wish to 
consider how the proof would go without this added condition. 


THEOREM 18.8: Suppose f, is differentiable on an interval (a, b) for all n and 
that f’,, is continuous for all n. If the sequence(f’,,) converges uniformly and there 


is a point ® © ‘%) so that F,(Xo) converges, then (f,,) converges uniformly to a 
differentiable function f and lim,_,..f', =f. 


PROOF: By subtracting constants from each function, we may assume f,(x9) = 
0 for all n [this is acceptable because (f,(x%9)) converges]. By the Second 


Fundamental theorem, /»\") = Seo Fn(®) for each n, and by Theorem 18.7. we 
have 


Zz rr 
. , fs ’ . nf ys . . / \ 
lim fn(x) = lim / Tee = | lim f! (ax) dx 
nx N00 J. Ja, "7% 


If we let g(x) = lim,_,..f’,(x), we have, by the Second Fundamental theorem, that 
the function /{*) = J:, (7) 4% = limn oo fo() ig differentiable and that /’ = lim, 500 


ji 


The most important uses of this result come in applying it to power series, but in 
order to do so, we will need to know the radius of convergence of a 
differentiated series. The proof of this lemma is left as Exercise 18.4.3. 


LEMMA 18.9: Jf @(*~®" has radius of convergence R, then the series of 
derivatives 4n("~2)"~" also has radius of convergence R. » 


Since every power series converges for at least one point (specifically, a) we can 
combine Theorems 15.15 and 18.9 and Lemma 18.10 to obtain: 


THEOREM 18.10: Jf @»("~ 4)" has radius of convergence R and t is such that 
|¢ — a| < R, then the function given by f(#) = L.4n( ~ @)" is differentiable at t and 


f(t})= } na, (t—a)}*"* 


EXAMPLES 18.4: 1. We saw in Chapter 15 that the series sl a converges 
for all x. Though we suspect it strongly, this does not mean that this series is 


equal to e*. We now have machinery that will let us show this. Since the radius 
of convergence of this series is infinite, we can find the derivative term by term 
for all x. and the radius of convergence of the differentiated series is also infinite 
(by Theorem 18.10). The differentiated series is, of course, the same as the 
original one. Thus the function given by /(*) = Ln=o st is such that I(x) = fx) for 
all x and f(0) = 1. You showed in Exercise 16.5.4 that e* is the only function 
satisfying these two conditions. Using this argument, we could base a study of 
exponential functions on the theory of series, instead of the other way around. 


EXERCISES 18.4 


1. Carefully state and verify that "two functions can be uniformly close together 
while their derivatives differ greatly." 


2. (a) How is the condition of continuity of the derivatives /’,, used in the proof 
of Theorem 18.8? 


(b) Discuss Theorem 18.8 without this condition. 


3. (a) Prove Lemma 18.9. 


(b) Show that the power series obtained by antidifferentiating another series 
term by term has the same radius of convergence as the original. (Assume 
that all the constants of integration are 0.) 


4. Use the technique of Example 18.4.1 and Exercise 16.5.4 to show that 


x 


(—1)" re" 
- cos &. 
Bi (2n!) 


LS 


5. Prove Theorem 18.10. 
18.5 TWO APPLICATIONS 


FOURIER SERIES: Consider the function f defined by 


f(x) = 


n| a 


= { 
+ + Beat +: Ts cos((2n "7" Da}. 
2d, (2n + 1)*r ‘. 


(The term z/2 is separated only because it doesn't fit the pattern of the others.) 
This series converges uniformly by the Weierstrass M-test, and the limit function 
is continuous by Theorem 15.4. What does the limit function look like? Here is 
part of the graph of the sum of the series up to n = 3: 


This looks remarkably like the absolute value function! Of course, if we add up 
the periodic functions cos((2n + 1)x), we expect the sum to be periodic. Here is 
the same partial sum over a larger domain: 


= 


asl. 4 , se = J. = 


-6 —4 2 0 


to 


4 6 


So this is not exactly the absolute value function, but over an interval a bit larger 
than [—3, 3] its resemblance to |x| can't be denied. Is this any more than an 
illusion? To evaluate this series for a particular value of x (other than x = 0) 
would be very difficult. We can, however, make some roundabout tests in which 
we compare f(x) and |x|. Since the series defining f converges uniformly, 
Theorem 18.7 says we can integrate it term by term. Doing so, we find that 


/ f(z) dz =x = [ |x| dx 


[since J, cos(n) dx = 0 for'n = 1,2...) OF course, the fact that the integrals of 


two functions are the same says precious little about the functions themselves. 
Now, if we multiply each term in the series for f by a bounded, continuous 
function, the series will still converge uniformly and we can still integrate the 
result term by term. For instance, 


7 TT 
/ f(x) cos(x) dz = 2 = | |x| cos(a) da. 
v—T7 oll . | 


Pursuing this idea further, we find that 


oT pt 
| f(x) cos(nxz) dx = 2 = | x| cos(nz) dr. 


7 


forn = 1, 2, .... [To do this calculation, note that J7,, cos(max) cos(nx) dx — ¢ if m F 
n. while : cos*(na) dx = for all n.} 
We still can't say with certainty that f(x) = |x| for -z < x < a, but we have 


"tested" both functions in infinitely many ways and gotten the same result every 
time. This seems to be very strong evidence (if you need more, show that 


| f(x) sin(nzx) dr = / |x| sin(nx) dx = 0. 


for n = 1, 2, ...). The series defining f(x) is called the Fourier series of |x|. The 
question of the relationship between a function and its Fourier series is a study in 


itself and is not entirely settled.* You will see how to get from a function to its 
Fourier series in Exercise 18.5.1. 


THE ANTIDERIVATIVE OF A POWER SERIES: Combining the results of 
Theorems 15.15 and 18.7, we obtain: 


COROLLARY 18.11: /f the series © @(*— 4)" has radius of convergence R and 


b,c) C (a— R,a+ R), then the integral of the series over this interval is the limit of 
the integrals of its partial sums. = 


EXAMPLES 18.5: 1. The function /‘*) = © * is of overwhelming importance 


in the study of statistics. A student of that subject is often called upon to discuss 
the relative sizes of integrals of fover various sets. We learn in calculus, though, 
that this function does not have an antiderivative that can be expressed easily. On 
the other hand, since f is continuous, the Second Fundamental theorem tells us 


that F(x) =Jo ©" is an antiderivative of f. We may not be able to say much about 


F(x), but we can write its Maclaurin series. Since e’ = 1+¢+7/2+ P/6+..., 
and this series has an infinite radius of convergence, we have 


for all t. Antidifferentiating this, we have *(") =~ > +jp- ite. 


EXERCISES 18.5 


1. (a) Evaluate all the integrals mentioned in the section about Fourier series. 


(b) The Fourier series of a function f(x) having period 27 is given by 


x 
ag P reer \ 
“tid ) a, cos(nx) + b, sin(nz), 


n=1 
where @n = $J",f(x)cos(nt) ang bn = 7 J7, f(x) sin(nz) de (the z in the 
denominator is there for technical reasons). Find the Fourier series of the 
function given on [—z, z) by 


a —-l1 -r<2r<0 
f(x) { 1 O<r<T 
and extended to the real line by saying f(x + 27) = f(x) for all x. 


(c) The series you found in (b) can't converge uniformly. Why? 


2. Find the Fourier series for the functions f(x) = x and f(x) = x’. 


3. If we let f(x) =x for 0 <x <z, then the absolute value function is the "even" 


extension of f to [—z, z] and y = x is the "odd" extension. We have found the 
Fourier series for these two functions, but they are quite different. Discuss 
this. 


4. Find the Maclaurin series for “\") = Jo cos() dt, 


5. (a) Find the Maclaurin series for f(z) = tet, 

(b) Find the Maclaurin series for arctan'”) = Jo re 4, 
(c) Describe the set over which the calculation in (b) is valid. 
(d) Recall that arctan(1) = 2/4. Find a series for z. 


(e) Note that 1 is an endpoint of the interval of convergence of the series in 
(b). Is the representation in (d) valid? 


(f) Show that the series in (d) converges conditionally. (This fact has given 
rise to a booming cottage industry—the finding of longer and longer decimal 
expansions of z.) 

(g) How many terms of the series in (d) would have to be added to produce 
an approximation for z accurate to an error of less than 10-10? 

(h) What would be the practical difficulties involved in performing the 
calculation in (g) on a computer? 


(1) Why is there no cottage industry devoted to producing decimal expansions 
of e? 


18.6 INTEGRALS WITH A PARAMETER 


Our final examples involve functions defined by integrals. Suppose that f is 


defined on some rectangle [a, b] x [c, d] and consider '(¢) = Ja ("1 4”, Ty this 
context, the variable ¢ is called a parameter. We will show that under certain 
conditions F' is continuous, and then turn our attention to the (somewhat more 
interesting) question of the derivative of F. 


THEOREM 18.12: Jf f is continuous on [a, b] x [c, d] and F is defined as 
above, then F is uniformly continuous on [c, d]. 


PROOF: We will have to accept that some theorems we have proved for the real 


line also hold in the plane. The square [a, b] x [c, d] is a compact subset of R?, 
and so fis uniformly continuous. Thus, given ¢ > 0, there is a 6 > 0 so that, for 


any Ze la, b}, f(x, M1) — f(a, to)| < e€ whenever It a t| < O. Then 
|F'(t,) — F(te)| 


b 
i f(z,t,) — f(x, te) dx 


b 
ea / |f(xz,t1) — f(a, te)| dx 


e(b—a), 


and consequently F' is uniformly continuous (since this estimate does not depend 
on ¢; Or f,). = 


THEOREM 18.13: Jf f and 3% are continuous on [a, b] x [c, d] and F is as 
above, then F is differentiable on (c, d) and ¥'\t) = Ja i f(x.t) de, 


PROOF: Since we have a candidate for the derivative, we should examine the 


f(u) — f(t) Ee 5 f (x,t) dz| (u 


expression - which should be o(u — #). Inserting 


the definition of /’, this becomes: 


“bh “bh "hs 
} 
/ f(a,u) dx — / f(x,t)dax — if = f(z, t) ts (u — t). 
va “a “a Ot 


Observe that the expression (wu — f) may be moved inside the last integral since it 
doesn't depend on x. Having done this, we can combine the three integrals into 
one to obtain 


fe u) — f(a,t)— a Se, t)(u —t)] dx. 

The integrand, as a function of u and ¢, is o(u — f£) (this is the definition of the 
partial derivative). The entire expression is continuous as a function of x, 
uniformly so in uw and ¢. Thus it is integrable, and its integrals are uniformly 
bounded, say by M. It follows that the integral is not larger than M[o(u — 1)] = 
o(u — ft), as desired. = 


EXERCISES 18.6 


1. 


(a) Verify that Theorem 18.13 holds if fis a polynomial of two variables. 


(b) Experiment with more complicated functions; trigonometric functions, 
exponentials, and so on. 


. Suppose f: [a, b] x [c, d] — R is continuous and is constant in its second 


variable for any value of the first variable [that is, f(x, t;) =/f(x, t) for any x, 
t, and f,]. 


(a) Show that the function /(") = In f(.t) de ig constant. 
(b) Prove this using Theorem 18.13. 


(c) Prove this without referring to Theorem 18.13. 


. Suppose f: [a, b] x [c, d] — R and ft, A) satisfies the partial differential 


equation Au,, + Bu, + Cu = D (where A, B, C, and D are constants). 

(a) Show that the function /\!) = J. f(w,t)de satisfies the ordinary 
differential equation Ay” + By’ + Cy = D(b — a). 

(b) Consider this result without worrying about hypotheses on the function / 


(c) Carefully describe the conditions f must satisfy to make this true. 


18.7 "THE MOORE THEOREM" 


There is a pattern in all this, having something to do with uniformity of one of 
the limit processes. Can this be generalized? We will state, but not prove, such a 
generalization. The proof may be found in The Theory of Functions of Real 
Variables by Lawrence M. Graves (published by McGraw-Hill Book Company 


in 1946).(3) 


THEOREM 18.14: Suppose the functions fix, vy), g(x), and h(y) are all finite- 
valued and that lim,_,, f(x, y) = h(v) on T and lim,,_,p fx, y) = g(x) uniformly on 
S. Then the limits lim, ya Ax, y), limy_,g g(x), and lim,_,, h(y) all exist and 
are equal and finite. = 


EXERCISES 18.7 


1. In this exercise, we adapt the least squares method to the problem of 
derivatives. The least-squares (or regression) line for the finite set of points 
{(x; vy) : i= 1, ..., m} 1s the line y = mx + b, where m and 5b are chosen to 


minimize the expression 2i=1\¥i~ VU" *))” (m and b are found by setting 


partial derivatives to 0). The regression line is supposed to "look like" the set 
of points in some way. Presently we will choose m(h) and b(h) so as to 
minimize 


eth = 
| (f(t) — m(h)t — b(h))” dt 
“iL 


and let Lf{x) = lim,_,9 m(h) if this limit exists. This is certainly the slope of a 
line that looks like fin a certain way. 


(a) Suppose f is continuous. Find formulas for m(h) and b(A). Discuss the 
interchange of integrals and partial derivatives in this calculation. 


(b) If f(x) =x’, show that Lf(a) = 2a for all a. 


(c) Show that if fis differentiable (in the ordinary sense), then f has a 
derivative in this sense and f(x) = Lf{x). 


(d) Find a function that is not differentiable (in the ordinary sense) but is 
differentiable in this sense. 


(e) Show that if f is differentiable in this sense and c<R, then cf is 
differentiable in this sense, and L(cf) = cL(/). 


(f) Show that if f and g are differentiable in this sense, then f + g is 
differentiable in this sense, and L(f+ g) = L(f) + L(g). 


(g) Does the Product Rule work for this derivative? Is it necessarily that case 


that L(fg)(a) = fla)L(g)(a) + g(a)L(f(a)?°) 


2. Reconsider Exercise 15.5.5 in light of the results of this chapter. 


l There is a potential problem in this notation: Is a1 13 "a sub eleven, three," "a sub one, thirteen," or "a 


sub 113" of an ordinary sequence? Should we need to refer to a specific term in a double sequence, we will 
use commas: a] 13 OF a] 13. 


2 It is only a small exaggeration to say the the study of Fourier series gave rise to most of modern 
analysis. Cantor was led to his concept of cardinality in part by his study of a question about Fourier series, 
for instance. Fourier himself had the distinction (if it can be called that) of having been condemned to death 
by both sides in the French Revolution. Neither sentence seems to have been carried out. 


3 This text was once recommended to me with the statement, "If it's not in Graves, it's not true." "The 
Moore Theorem" is Theorem 2 in Chapter VII and is accompanied by my candidate for the most obscure 
footnote reference ever written: 


See E. H. Moore, "Lectures on Advanced Integral Calculus" (unpublished), 
University of Chicago, Autumn Quarter, 1900. Manuscript in University of 
Chicago library, worked out by Oswald Veblen .... 


As Casey Stengel used to say, you can look it up! 


4 This idea is examined in "A new extension of the derivative" American Mathematical Monthly 97 
(1990), 230-233, by Daniel B. Kopel and the author. It was first developed as Mr. Kopel's undergraduate 
thesis. 


Part Four 


Selected Shorts 


In the final part of the book we will look at three short subjects and then plug a 
lingering gap in our knowledge. In the next three chapters, we consider questions 
that are very different from those to which we have become accustomed. Most of 
our time in mathematics classes is spent proving statements of the form "If X 
does Y, then XY must also do Z.” In this part of the book, we examine questions 
"If X does Y, how badly can_X fail to do Z?" 


In Chapter 20 we ask "If a function is increasing, at how many points can it 
fail to be continuous?" This may be the least natural question of the three, and 
the fact that we can give a complete answer may come as a surprise. Next we ask 
"If a function is continuous, at how many points can it fail to be differentiable?" 
We will find that we spent all our time in calculus talking about functions that, in 
a very precise sense, are hardly there at all! Finally we consider the question "If 
a function is integrable, at how many points can it fail to be continuous?" This is 
the deepest question of the three, and we will fall just short of finding a complete 
answer to it. 


These questions help us see how the properties described in the Big Theorem 
affect what we can and can't do with calculus and show us how delicately the 
whole subject is strung together. There is a subtle common theme to these 
questions that gives us some hints about directions we might go in the subject. 


Finally, we dispose of a problem that has been begging our attention from the 
start. We now know a lot about how the real numbers work, but we still don't 
know what they are! In Chapter 22 we will, at long last, build the real number 
system. Much of this book has been designed to shake us in our certainty that we 
know at least what the real numbers are. In Chapter 22 our understanding will be 
brought full circle. What better way to finish? 


Chapter 19 


Increasing Functions 


19.1 DISCONTINUITIES 


We spend so much time and effort studying continuous functions and their 
properties that it might never occur to us that discontinuities could have 
something interesting to teach us, too. In calculus, we saw two basic types of 
discontinuities. First, we encountered the type we will now call a jump, which 
might look like this: 


Later we encounter the more dramatic behavior of a function like f(x) = sin(1/x), 
which is discontinuous at x = 0 despite the fact that there is no obvious break in 
the graph. The graph looks something like this for 0 < x < 2, but it oscillates so 
quickly as x gets near 0 that it is very difficult to depict accurately: 


An even more extreme example of this "up-and-down" behavior is found in the 
Dirichlet function (which is impossible to graph): 


l ifrEeQ 
Mar) = 
he) fe ifc ¢é Q. 


After we describe them precisely, we will find that these two cases (the jump and 
the bouncing up and down) are, 1n a precise sense, the only possibilities! 


DEFINITION 19.1: (a) Suppose f: 4 — R and that A is a neighborhood or 
deleted neighborhood of a.(!) Then fhas a jump discontinuity at a if the one- 
sided limits iM: c+ f() and lim.—a- f(”) both exist but are different. This is also 
called a simple discontinuity or (in contrast to the next definition) a 
discontinuity of the first type. 


(b) Suppose f: 4 — R and that there is a 6 > 0 so that either (a, a + 0) or (a — 0, 
a) is contained in A.(!) Then f has a discontinuity of the second type at the 
point a if either !imz—a+ f(£) or limz—a- f() fails to exist. 


This terminology is quite bland, but we're stuck with it. The observant reader has 
noticed that there is a third possibility not accounted for in these definitions. It 
might happen that lim..+ f(z) = lim,..- f(z) [that iS, lim,._,¢ Ax) exists], but fis 
discontinuous. Such a discontinuity can be "removed" by redefining the 
function, setting f(a) = lim,_,, f(x). We will assume that all such removable 


discontinuities have been dealt with in this way and will not concern ourselves 
with them further. We will also concentrate for now only on bounded functions 
since unbounded functions present different (and less interesting) problems. 

The Dirichlet function and the function f(x) = sin(1/x) both display 
discontinuities of the second type (the former at every real number, the latter 
only at x = 0). These functions are very different, which might lead us to think 
that such discontinuities are intractable. Fortunately, they are more manageable 
than we might fear. The following lemma says that a function with a 
discontinuity of the second type must display the up-and-down behavior 
apparent in sin(1/x) and the Dirichlet function. 


LEMMA 19.2: Suppose f : A — R is bounded and there is a 6 > 0 such that 
(a,a+6)C A, Then 'iMs—a+ f(®) fails to exist if and only if there are numbers c < 
d so that for every 0 < € < 6 there are numbers x and y in (a, a + €) with fx) <c 
and fly) > d. A similar statement holds if *™z—a- f(*) fails to exist. 


PROOF: Since lim:—a+ /() fails to exist, there is a decreasing sequence (x,,) 
with lim x, = a but such that lim /(x,,) does not exist. Since fis bounded, the 
sequence (f(x,,)) is also bounded. Then (f(x,,)) 1s a bounded, divergent sequence, 
and so it has subsequences that converge to two different limits, say y and 0. We 


may assume that y < 6. Any numbers c and d with y < c < d < 6 will satisfy the 
theorem (be sure you see why this is so). = 


EXAMPLES 19.1: 1. Let D(x) be the Dirichlet function, c = 1/4, and d = 3/4. 
For any a€R and any «¢ > 0, if © © (4,@+€) is rational and ¥ € (44+) is 
irrational, then D(x) > d and D(y) <c. 


2. For the function f(x) = sin(1/x), we can take c = —1/2 and d = 1/2. Then x can 
be any number of the form 1/(2n + 3/2)z [at each of which sin(1/x) = —1], and y 
can be any number of the form 1/(2n + 1/2)z [where sin(1/x) = 1]. There is one 
of each of these points in any interval (0, «). 


3. Lemma 19.2 does not hold if the function is not bounded. Consider f(x) = 1/x. 
Though !im,—0+ /() fails to exist, the conclusion of the theorem does not hold. 


EXERCISES 19.1 
1. Complete the proof of Lemma 19.2. 


2. Show that a function having a discontinuity of the second type can't be of 
bounded variation (see Exercise 17.7.6). 


3. Suppose that fhas a discontinuity of the second type as x — a from the right, 
but is continuous otherwise. Show that the arc length of the graph of / is 
infinite in any interval (a, a + e). (Don't worry about the technicalities of the 
definition of arc length now.) 


19.2 DISCONTINUITIES OF MONOTONE FUNCTIONS 


The next two results show that the discontinuities of monotone functions and the 
behavior of such functions near a discontinuity are very predictable. 


THEOREM 19.3: A monotone function can have only jump discontinuities. 


PROOF: We need only observe that the condition of Lemma 19.2 can't occur 
for a monotone function. Suppose fis increasing, x > a, and c and d are such that 
f(x) < c < d; then fly) < fx) < c for all ¥ © (@;”), and it cannot be the case that f(y) 
>d for any such y. =» 


LEMMA 19.4: Jf f is an increasing function having a discontinuity at a, then 


lim,a+ f(t) > lim,.,~- f(*), The inequality is reversed for a decreasing function. 


PROOF: Since the discontinuity must be a jump, both one-sided limits exist. 
We will show first that !im:—o+ f(2) = lim,—.- f("), Let e > 0 be given. Since both 
one-sided limits exist, we may choose x > a so that f(%) ~ {limyoa+ /(w)] < ¢/2 
and y < aso that llim,..- f(z)] — f(y) < €/2, Then 


lim,_.a+ f(x) —lim,.,- f(z) 


(F(x) : 5) (f(u) , =) 
= f(r)—fly)—-é« 


The last inequality holds because fis increasing and y < x. This is true for any ¢ 
> 0. By Exercise 4.6.13, lims—a+ f(2) — limza- f(z) 2 9, Since we are assuming / 
doesn't have a removable discontinuity at a, it can't be the case that 
lime a~ f(v) = lim, e+ f(*), and strict inequality holds. = 


THEOREM 19.5: The set of points at which a monotone function is 
discontinuous is countable. 


PROOF: Suppose / is increasing. If f has a discontinuity at the point a, let 
Iq = (limy—a~ f(x), limy—a+ f(2)), By Lemma 19.4, I, # 0. We will show that if f 
has discontinuities at a and b, then Ja /» = %. Suppose a < c < b. Since f is 
increasing, 


lim._..+ S(z) < J(c) < lim,..5- f(z). 


Thus y < f(c) for y © Ja and y > f{c) for y > fle) for y © o, Tt follows that 
I, ly =, This means that the association a — J, is one-to-one. In the proof of 


Theorem 8.11, we showed that any collection of mutually disjoint nonempty 
open intervals is countable, so {a : fhas a discontinuity at a} is countable. = 


EXERCISES 19.2 
1. Show that a composition of increasing functions is increasing. 
2. Draw a sketch to illustrate the proof of Theorem 19.3. 


3. Complete the proof of Lemma 19.4. 


4. (a) Show that the association a — J, in the proof of Theorem 19.5 is one-to- 
one. 


(b) We haven't checked that the the association a — J, is onto. How can we 
draw the conclusion at the end of the proof? 


5. Show that the discontinuities of a decreasing function are all jumps and that 
there can be at most countably many of them. 


19.3 MORE ON JUMPS 


The proof of Theorem 19.5 rests heavily on the fact that the function involved is 
increasing. But it is not so much the nature of increasing functions as it is the 
nature of jumps that makes the result true. In contrast to the following theorem, 
recall that the Dirichlet function has a discontinuity of the second type at every 
point of the real line. 


THEOREM 19.6: Any function can have only countably many jump 
discontinuities. 


PROOF: We show first that if f has a jump discontinuity at a, it can't have 
another one as big nearby. Suppose f has a jump discontinuity at a and let 
lim, a+ f(@) ~ lim, —a~ f()| = © (we will call this the jump of fat a). There is a 6 
> 0 so that |f(x) — lim,.+ f(xz)| < ¢/2 when z € (a,a¢ +6). Then for 7: ¥ € (4,4 + 6) 
, we have |f(x) — f(y)| < e, and there can be no jump as big as « in the interval (a, 
a + 6). A similar argument holds in an interval (a — 6, a) (though 6 may be 
different). Thus if fhas a jump of € at a, there is a 6 > 0 so that fhas no jump of 
more than ¢ in the interval (a — 6, a + 0). Now suppose a, and a, are points 
where f has a jump of at least ¢ (with corresponding 6, and 0). Note that (a; — 
0/2, a; + 6,/2) and (ay — 6,/2, ay + 6,/2) are disjoint. The points where f has a 
jump of ¢ or more can each be enclosed in an open interval, and these intervals 
can be chosen to be mutually disjoint. Thus there are only countably many such 
points. Thus for each natural number n, there are countably many points where / 
has a jump discontinuity with jump more than 1/n. But any jump is larger than 
1/n for some n (why?), so 


{a: f has a jump discontinuity at a} 


x . . ; 
= U,_,{a@: f has a jump of more than 1/n at a}. 


We have written the set of points where fhas a jump discontinuity as a countable 
union of countable sets, and so this set is countable. = 


Notice that Theorem 19.5 follows directly from Theorems 19.3 and 19.6. It is 
sometimes easier to show that the discontinuities of a function must be jumps 
than to count them directly (see Exercise 19.1.2). 


EXERCISES 19.3 


1. Iffhas the property that |{x) — f(y)| < é for all x and y, show that f can't have 
a jump discontinuity with a jump of more than e. 


ps pg that f has only jump discontinuities. If f has a jump at a, write 
Jq = limy+ f(r) — limz_4- f(2) and let ' J(r)=)0, a<a Ja. 


(a) Show that J is continuous at any point where fis continuous. 


(b) Show that J is constant on any interval on which fis continuous and has a 
jump discontinuity at each point where fhas a jump discontinuity. A function 
whose graph consists of intervals of constancy separated by jump 
discontinuities is called a jump function. 


(c) Show that the function f(x) — J(x) is continuous. Thus every function 
having only jump discontinuities can be written as the sum of a continuous 
function and a jump function. 


19.4 THE CANTOR FUNCTION 


We finish this chapter by looking at another example of curious behavior in an 
increasing function. Here we will review the construction of the Cantor set 
(Exercise 8.4.7), while at the same time building a very strange function. We 
begin with a lemma, whose proof is left as Exercise 19.4.1. 


LEMMA 19.7: Jf (f,) is a sequence of functions converging to f, and if f,, is 
increasing for each n, then f is increasing. w= 


We now begin our construction. The construction of the Cantor set begins with 
the interval Cy = [0, 1]. At each step in the process, we find ourselves with a 
collection of closed intervals. To get to the next stage, we remove the open 
middle third of each interval. Now let Ko(x) = x for 0 < x < 1. Removing the 


middle third of Cy, we obtain ©: = (0, 1/3} U[2/3,1], Tt is easiest to define K, by 
showing its graph: 


l 


w 
colte + 


(It's not difficult to produce formulas for K,, but the effort would only get in the 
way at this point.) Note that K, is constant on the interval removed from Cp to 
make C, and that the maximum difference between Kg and K, is 1/6, occurring 
at 1/3 and 2/3. To construct Ky, K3, ..., we replace each nonhorizontal portion of 
the previous one with a piece resembling K,. For example, here is the graph of 
K,: 


— 
l 


Ito 


— 
1 


Cole of 


Rito 


Observe that K, is constant on the interval on which K, is constant and on the 
intervals removed from C; to make C,, and that the maximum difference 
between K, and K, is 1/12. We continue in this way with each function K,, 


having the following properties: K,, is continuous and increasing; K,, is constant 
on the intervals on which K,,; is constant and on all the intervals removed from 


C,,, to make C,; the maximum difference between K,, and K,,_; is 1/(6 x 2”), 
From the last of these properties, it follows that for m > n, 


max |K,,(z) — Ky(2)| 


By Theorem 15.6, (K,) converges uniformly. Call its limit K. Since K,, is 


continuous for all 1, K is continuous. By Lemma 19.7, K is increasing. 
Furthermore, K is constant on every open interval removed from Cp to make the 


Cantor set; that is, K’(x) = 0 for x in any of these intervals. 


We've constructed a continuous function whose derivative is 0 except on a set 
whose length is 0 (a precise definition of "length 0" is given in Chapter 21). Yet 
the value of K manages somehow to change from 0 to 1. It seems that we have 
gotten from 0 to 1 without ever moving! A function like K is called, 
appropriately enough, singular. Singular functions have properties even more 
peculiar than the one we've seen. The function K’(x) is continuous (and equal to 
0) except for a set of length 0 (the Cantor set). As we shall see later, this means 
that the integral of K'(x) is zero over any interval. But K is not identically 0, and 
so K is not the integral of its derivative. What is the derivative of K? Because of 
the Mean Value theorem, K can't be differentiable in the usual sense at the points 
of the Cantor set. The slanted sections of K,, have slope (3/2)", and it would 


make sense to say that the derivative of K is +co at all points of the Cantor set 
(we would have to adjust our theory of differentiation to allow infinite 
derivatives). It can be shown that the derivative of any continuous, singular 
function must be +00 at any point where it isn't 0. 


EXERCISES 19.4 
1. Prove Lemma 19.7. 


2. Give a precise definition of the function Kj. 


3. Verify the claims made about the functions K,,. 


. Let f: [a, b] > R be such that 7’ is integrable.” Recall that the function given 
by f() = Ja £4) 4t is absolutely continuous (see Exercise 17.7.7). Show 
that the function f(x) — F(x) is singular. Every function whose derivative is 
integrable can be written as the sum of a singular function and an absolutely 
continuous one. 


. Discuss the results of constructing a "Cantor-like" function based on the fat 
Cantor set of Exercise 8.4.7.1. 


. (a) Construct an increasing singular function (similar to the Cantor function) 
that increases from 0 to | on the interval [a,b]. Call this function ff, 4). (ffo,1] 


should be the Cantor function.) 
(b) Show that 10.21 * 13,2) is singular. What are its intervals of constancy? 


(c) Construct a strictly increasing singular function. 


. How does the Mean Value theorem justify the statement in the last paragraph 
of the section? 


! This condition only serves to ensure that the limits in the definition make sense. 


2 Remember that /’ need not be defined everywhere in order for it to be integrable. 


Chapter 20 


Continuous Functions and Differentiability 


20.1 SEPARATING THE GOOD FROM THE BAD 


As we learn about functions, the image evoked by the word changes. When we 
are young, a "function" is, we think, a straight line. Later we learn about curves, 
and then discontinuities. Finally we learn about differentiability, and our 
response to the request "draw a graph" might look something like this: 


| | } | fa 
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This graph has points of discontinuity or nondifferentiability separated by 
intervals in which the function is differentiable. Is this as complicated as a 
function can get? We know, of course, that a function can have many more 
discontinuities than this. The Dirichlet function is discontinuous everywhere. But 
what about the points of nondifferentiability? At how many points can a 
continuous function fail to be differentiable? We must be a little careful about 
how we phrase this question and what sort of answer we expect. We might, for 
example, specify a set and ask whether there is a function that is differentiable 
(or not) precisely on that set. This is a difficult task, and we will not begin to 
attack it here. But we can easily find a function that fails to be differentiable on 
an infinite set. For instance, the function whose graph is below fails to be 
differentiable at n/2 for each integer n. Though the set of points of 
nondifferentiability of this function is infinite, its graph still has "bad" points 
separated by intervals of "good" ones. 


a ~1 ~—o x«” ~S 


Can a continuous function be any worse-behaved than this? We will find an 
answer so dramatic that we can stop worrying about exactly what the question 
was! We will construct a function that is continuous everywhere but 
differentiable nowhere. An example of a function like this (which you will 
examine in Exercise 20.3.2) was found by Weierstrass in the late nineteenth 
century. It caused quite a stir in the mathematical world, forcing as it does a 
thorough reconsideration of most of the ideas involved. The function we will 
construct was described by van der Waerden in 1930. It has a pictorial appeal 
that we will find to be, unfortunately, largely misleading. Still, being led down a 
garden path can sometimes be educational. 


20.2 THE NATURE OF CORNERS 


We often say that a differentiable function looks like a straight line as we look at 
it more and more closely. The absolute value function—the first example we see 
of a nondifferentiable function—has a "corner" at x = 0. But how do we identify 
a corner? A graph is a straight line if the quotient (b — d)/(a — c) 1s the same for 
any two points on the graph (a, b) and (c, d). Consider the absolute value 
function. In any neighborhood of 0 we can select two points with positive first 
coordinates. The difference quotient formed using them will be 1. Any 
difference quotient formed using two points with negative first coordinates is —1. 
Within any neighborhood of 0, we can form difference quotients that are far 
apart. 


EXAMPLES 20.2: The description of how to find a corner in the previous 
paragraph seems precise enough. It is, however, fallacious. The function given 
by 


mn x? rEQ 
f(z) { 0 2r¢Q 


is differentiable at 0 and /(0) = 0. But we can chose points near and to the right 
of 0 to produce a quotient that is just about anything we like! The same can be 
done to the left of 0. The picture below indicates why this is so. (Though it is 
impossible to draw this graph well, we can get the idea of the problem. You will 


examine this claim about secant lines in Exercise 20.2.1.) 


It seems, though, that the behavior of secant lines near a point where a function 
is differentiable must be predictable. In the this example we have been a little too 
lax about how the secant lines were chosen. The following lemma, a sort of 
Cauchy criterion for derivatives, clears this up. 


LEMMA 20.1: The function f is differentiable at the point a if and only if the 
following condition holds: For each ¢ > 0, there is a 6 >0 so that if x1, x5, x3, and 


x4 are elements of (a — 6, a + 0) with x; and x, on opposite sides of a and x3 and 
x4 on opposite sides of a, then 


| f(vi)-—f(z2) f(xs) — f(ra) 
| 


fy — 29 3 L4 


The phrase "on opposite sides of a” is to be interpreted in the following sense: If 
x, <a, we must have x, => a, and if x, > a, we must have x, <a. If x; =a or x, = 


a, then x, and x, are on opposite sides of a regardless of where the other is. Of 
course, we can always assume that x, <x, if it is convenient to do so. 


PROOF: We begin with a question about secant lines. If b < a < c, can the slope 
of the secant line between (b, f(b)) and (c, f(c)) be computed in some easy way 
from the slopes of the two secant lines passing through (a, f(a))? 


ie 


b aa a Cc 


In this picture, the slope of the secant line over [b, a] is less than the slope of the 
secant line over [b, c], and the slope of the secant line over [a, c] 1s greater than 
the slope of the secant line over [b, c]. Perhaps the slope of the longer secant line 
is the average of the other two? Almost any example will show that this is not 
the case. However, the slope of the longer line is a weighted average of the other 
two, and with a little algebra, we can find the appropriate weights: 

F(c) — f() _ f(a) — flb)a—b | f(c)—fla)c-a 


c—b a—b c—b c-—a e— 5p: 


If fis differentiable, the large fractions on the right can be made close to f(a) by 
making b and c sufficiently close to a. More precisely: 


f(c) — f(b) 
c—b 


= (f'(a)+v] 2 
a 


= f'(a)+v = 
Where v and w approach 0 as b and c approach a. The "only if" part of the 
theorem follows from this. The other part of the proof is similar to Exercise 
8.6.5, and is left as Exercise 20.2.2. m= 
EXERCISES 20.2 


1. (a) If f(x) =x? for = © Q and f(x) = 0 for * ¢ Q, show that for any ¢ > 0, 6 > 0, 
and s € R, there are numbers @2 € (9,¢) with 


f(b) — fla) 


ad. 
b—a 


(b) Explain why this means that we can find secant lines whose slopes are 
"just about anything we like." 


(c) Show that fis differentiable at 0 and that /(0) = 0. 
(d) Show directly that f satisfies the conditions of Lemma 20.1. 
(e) Show that the function g(x) = sin x is differentiable at 0 by showing that it 
satisfies the conditions of Lemma 20.1. 
2. (a) Complete the proof of Lemma 20.1. 


(b) How is the stipulation that x, and x, are on opposite sides of a used in the 
proof of this lemma? 


3. (a) We have noted that the absolute value function is not differentiable at 0, 
roughly because the slopes of its tangent lines make an abrupt change there. 
Explain this precisely by considering Exercise 12.6.2. 


(b) But the function 


1 ifx>0 


faves J 
Ne) \-1 ife<0 


has a jump discontinuity, and it's the derivative of |x|, isn't 1t? 
20.3 VAN DER WAERDEN'S FUNCTION 


We now take another look at the function seen at the beginning of the chapter, 
which may be defined as '" ‘*) = min{|2 —n|:n € Z} that is, W(x) is the distance 
from x to the nearest integer]. Here again is the graph of W: 


~2 1 0 1 2 
If we look at this graph near any point that is "half an integer," we can find 


secant lines with slopes of 1 and —1. In the graph of W(2x), there are more such 
changes of slope and they are closer together: 


2 <a 0. 7 a 


We will want to keep the slopes of these segments +1. We can do this with a 
change of vertical scale. Here is the graph of (1/2)W(2x): 


—2 ] 0) 1 2 

Each function (1/4)W(4x), (1/8)W(8x), ..., fails to be differentiable on a set that 
is "twice as big" as the one before. (In the sense of cardinality, of course, these 
sets are all the same size.) None of these functions is nowhere differentiable, but 
they have more and more points of nondifferentiability and seem to get closer to 
what we want. We might guess that lim,_,,.(1/2”)W(2"x) would fill the bill. But 
this limit is 0 for all x (be sure you why), and the function f(x) = 0 is quite 
differentiable indeed. However, the functions (1/2”)W(2"x) do provide us with 
secant lines of slope 1 and slope —1 very close together. We have only to 
assemble them in the right way. 


n—oo 


THEOREM 20.2: The function Yn=0 2 V(2"2) is continuous everywhere yet 
differentiable nowhere. 


PROOF: This function is continuous by the Weierstrass M/-test. We will use 
Lemma 20.1 to show that it is nowhere differentiable. Consider the numbers 
m/2" for integer values of m and positive integer values of n. These are the 
dyadic rationals (see Exercise 6.1.13), which are a dense subset of the real line 
(so we can find them close to and on either side of any real number). Note that 
W(2"x) = 0 for all x of the form m/2*, where n > k. Also, if k < n, the slope of any 
secant line to W(2"x) based on adjacent dyadic rationals having the same 
denominator is 0, 1, or —1 (this is clear from the pictures). Thus the slope of any 
such secant line to no 2° V(2"2) is an integer. Pairs of adjacent dyadic rationals 
on opposite sides of any point can be chosen so that the difference between the 
slopes of the associated secant lines is not 0, hence this difference must be at 
least 1. By Lemma 20.1, this function is not differentiable at any point. = 


EXERCISES 20.3 


1. (a) With W(x) as above, show that lim,,_,,,(1/2”)W(2"x) = 0. 


(b) Discuss the behavior of lim,_,,, W(2"x). 

9) . f(z) = , & cos(13"rz) : . 

. (a) Show that the function 2=0 is continuous and 
nowhere differentiable. This is a version of the function given by 
Weierstrass. Notice that the functions that make up this series have no 
"corners" at all! 


(b) Suppose a is an odd natural number, 0 < b < 1, and % > !4 2™. Show that 
the function 9) = Lneob"cos(e"rx) ig continuous and nowhere 
differentiable. (This is an involved proof. Here are some hints: (i) All the 
conditions on a and b will come into play eventually. (11) Look at the 
standard difference quotient. (11) Look first at a "finite" part of the series, and 
then at the corresponding "tail." (iv) If the finite part of the series goes to 
infinity, the only way the function can have a derivative is if the tail goes to 
—oo; make sure the tail is bounded below. (v) When examining the tail of the 


series, let ax = a, + B,, where a, is an integer and —1/2 < f, < 1/2, and look 
1 


ath = —Gr* ) 

(c) (Research problem!) Suppose that f is a continuous periodic function 
having an interval on which it is strictly increasing and an interval on which 
it is strictly decreasing. Are there necessarily positive numbers a and b so 
that 22n—oo"/(a"©) is nowhere differentiable? 


20.4 SETS OF FIRST AND SECOND CATEGORY 


We had to work pretty hard to get the van der Waerden function. This effort 
might lead us to believe that continuous, nowhere differentiable functions are 
very rare. This is not the case at all, and finding out why will lead us into some 
new territory. 


There are many ways a set can be said to be small. A set with three elements 
is smaller than a set with four, a finite set is smaller than an infinite set, and a 
countable set is smaller than an uncountable one. Topologically, we might 
consider a set to be large if it is dense in the real numbers. In this sense, the 
rationals and irrationals are both large, and an idea of size that doesn't 
distinguish these two sets might not be useful. But we are on the right track. 


DEFINITION 20.3: (a) A set D is dense in the set S if 5 © D™ (that is, S is 
contained in the closure of D). 


(b) A subset of the real line is said to be nowhere dense if it is not dense in any 
nonempty open interval. 


EXAMPLES 20.4: 1. You will show in Exercise 20.4.3 that a union of finitely 
many nowhere dense sets is nowhere dense. It is easy to see that a set with one 
element is nowhere dense (it is its own closure and clearly contains no open 
interval). Thus any finite set is nowhere dense. 


2. The set of natural numbers is also closed and contains no open interval, so N 
is nowhere dense, even though it is infinite. 


3. The rational numbers and the irrational numbers are dense in any set. The 
rationals are small in the sense of cardinality yet big in this sense. Further, we 
see that an infinite union of nowhere dense sets can be dense. The set of rational 
numbers is a countable union of sets with one element. 


4. The Cantor set (Exercise 8.4.7) is closed (and hence equal to its closure) yet 
contains no open interval. Though it is uncountable, the Cantor set is nowhere 
dense. It is big in the sense of cardinality but small in this sense. This is one 
reason the Cantor set is so interesting. 


The third example above tells us where we should look next. The rational 
numbers and the irrational numbers are both dense in the whole real line, but 
they are different in that the set of rational numbers can be built from a countable 
union of nowhere dense sets. The set of irrational numbers, we will find, can't. 


DEFINITION 20.4: A set is of first category if it can be written as a countable 
union of nowhere dense sets; otherwise, it is of second category. (These bland 
names are sometimes replaced by the more descriptive meager and nonmeager, 
respectively.) 


Note that a set that is a countable union of first category sets is also of first 
category. The following theorem strengthens this observation. 


THEOREM 20.5: (The Baire Category Theorem) /f {S, : n = 1, 2, ...} is a 


countable collection of closed sets whose union contains a nonempty open 
interval, at least one of the sets S,, must contain a nonempty open interval. 


Observe that the negation of the conclusion—None of the sets S,, contains a 


nonempty open interval—seems to give us more information than the conclusion 
itself. This means a proof by contradiction is appropriate. Beyond this beginning, 
we should notice that the proof merely takes advantage of the facts available. 
The appearance of smaller and smaller intervals suggests the Nested Intervals 
property (though some care is necessary to ensure that a nest of closed intervals 
can be produced). 


PROOF: Suppose that none of the sets S,, contains a nonempty open interval but 
that (20,40) © UnSn, Since (ap, bp) is not contained in S, (this is the assumption), 


51, ©. ig not a cluster 


there exists 71 © (@0,bo)\51, Since S; is closed and “: 
point of S,, and so we may find an open interval (a,, 5,) containing x,, disjoint 
from S}, and such that the interval [a,, 5,] 1s contained in (dp, bg) [be sure you 
see why this last condition can be met]. Now (a), 5,) is not contained in S5. We 
may repeat this argument to find 72 © (#1:51)\S2 and (a5, b>) that contains x5, is 
disjoint from S,, and is such that [a, b,] is contained in (a, 5;). 

Now [q, 5,], [ao, 5], ..., 18 a nest of closed bounded intervals, hence has a 


nonempty intersection by the Nested Intervals property. Now let ” S An lan: bn], 
Since @n+'nJ Sn =0 and © € [@nsbn] for all n, we have * € Sn» for all n. But 
E ja1,1, S (a0,0) © U; Sn, and this is a contradiction. = 


COROLLARY 20.6: The set of real numbers is of second category. 


PROOF: Suppose that the real numbers are written 8 = Un Sn, Then it is also the 
case that R =U, Sn, and so UnSs is a countable union of closed sets that 
contains a nonempty open interval. By the Baire Category theorem, one of the 
sets 5n must contain a nonempty open interval. That is, at least one of the sets S,, 


cannot be nowhere dense. Thus R is not of first category. = 


COROLLARY 20.7: The set of irrational numbers is a second category subset 
of the real numbers. 


PROOF: Left as Exercise 20.4.5. = 


The proof of Theorem 20.5 rests on the completeness of the real numbers (in the 
guise of the Nested Intervals property) yet draws from it a purely topological 
result. This is a good indication that something important is at stake here. The 
question of whether a topological space is of second category is of great interest 


in advanced applications, and spaces that are not of second category are very 
different from those that are. This is a major concern in the subject called 
functional analysis. One must avoid the impression that "the biggest set around" 
is always of second category. This is why Corollary 20.6 is important. We could, 
for instance, build a theory of calculus around the rational numbers without ever 
mentioning any bigger sets, but that wouldn't make Q a set of second category, 
and the "calculus" we would obtain would be very different from the one with 
which we are familiar. 


EXERCISES 20.4 


1. 


Show that a set that is dense in the real line is also dense in any subset of the 
real line. 


. Show that a set is nowhere dense if the interior of its closure is empty. 
. Show that a union of finitely many nowhere dense sets is nowhere dense. 


. (a) Show that a countable union of sets of first category is of first category. 


(b) Show that a subset of a set of first category is of first category. 


. Prove Corollary 20.7. 


. (a) IfA is a first-category subset of B and B is of second category, show that 


AFB. 


(b) Show that irrational numbers exist. 


. Is the Baire Category theorem equivalent to the completeness of R? (That is, 


does it imply a part of the Big Theorem?) 


. (a) In Exercise 8.4.7.1, you constructed a fat Cantor set (one whose length is 


not 0). Show that this set is nowhere dense. 


(b) If f(x) = px + q, the set f(S) is called a linear translation of S. Show that 
any linear translation of a nowhere dense set is nowhere dense. 


(c) Given a < b, show that [a, b] is a linear translation of [0, 1]. 


(d) Construct (from the fat Cantor set) a subset of [0, 1] that is of first 
category but whose length is | (we may assume that the length of a union of 
countably many disjoint sets is the sum of the lengths of the individual sets). 


20.5 THE SET OF DIFFERENTIABLE FUNCTIONS 


The proof of Lemma 20.8 is left as Exercise 20.5.1. 


LEMMA 20.8: /f'S is a closed set whose complement is everywhere dense, then 
S is nowhere dense. = 


The idea of category makes sense in any topological space since it can be 
defined in terms of the purely topological notions of the closure and interior of a 
set. In particular, we can discuss category in the spaces we have made of sets of 
functions. A set of first category is, in the topological sense, very small. The next 
theorem tells us that we spent all our time in calculus studying functions selected 
from a set that, topologically, was hardly there at all! The van der Waerden 
function is not a rarity at all but typical.! Notice, though, that while Theorem 
20.9 is quite dramatic, we should be surprised by it only if we assume that the set 
of continuous functions on an interval is of second category. This is true, but we 
won't prove it. 


THEOREM 20.9: Among the continuous functions f : (0, 1) — R, those that are 
differentiable, even at a single point, form a set of first category. 


PROOF: We will consider only "right-hand" derivatives (the proof gives us a 
little more than the theorem states). Let S,, be the set of functions fsuch that, for 
some * © (0! = 1/") and all h € (0,1/n), we have |[fx + h) — fx) /h| <n. (The 
choices of x and h serve to guarantee that fis defined at both x and x + h.) Any 
function having a right-hand derivative at any point is certainly in one of these 
sets. We will show that the sets S, are closed and that their complements are 


dense. We will use the sequential characterization of closed sets. Suppose (f;) is 
uniformly convergent (that is, convergent in the space of functions). Then lim f;, 


is continuous and, since the operation that takes f to |[f(x + h) — f(x)]/h| is 
continuous, lim f, also satisfies the inequality defining S, (that is, if /e © S» for 


all k, then lim fe € Sn), Thus S,, is closed. According to Lemma 20.8, we may 
show that S|, is nowhere dense by showing that R\S,, is dense (that is, that there is 
an element of R\S,, close, in the uniform sense, to any continuous function). This 
is automatically true for any function in R\S,, and so we need only consider a 


function 9 © Sn, Let e > 0 be given and let r be a function such that |r(x)| < ¢ for 
all x and that any interval contains points x where |r'(x)| > 2n [the function W 


used in the construction of van der Waerden's function can be adjusted to do 
this]. Then 9+7 © R\Sn (there are points where |(g + r)'(x)| > n) and g + r is 
uniformly close to g (since I(g + r) — gil, = lil, < e). Thus R\S,, is dense in the set 
of continuous functions for each n. = 


EXERCISES 20.5 


1. (a) Prove Lemma 20.8. 


(b) Show that the conclusion of Lemma 20.8 does not hold if the set is not 
assumed to be closed. 


2. (a) Explain why Theorem 20.9 is "only surprising if ... the set of continuous 
functions on an interval is of second category." 


(b) Does Theorem 20.9 establish that continuous, nowhere differentiable 
functions exist? 


3. Verify the statement "Any function having a right-hand derivative at any 
point is in one of these sets" made in the proof of Theorem 20.9. 


4. Why does the proof of Theorem 20.9 establish more than the theorem states? 
! The word "typical" is used nowadays to mean "except for a set of first category"—hence statements 


like "The typical real number is irrational." A set whose complement is of first category is called "residual," 
a curious term, since we normally think of a "residue" as the smaller portion of something. 


Chapter 21 


Continuous Functions and Integrability 


21.1 INTEGRABLE FUNCTIONS (REVISITED) 


We have seen that a continuous function is Riemann integrable, while a function 
as wildly discontinuous as the Dirichlet function is not. In this chapter we will 
examine the intermediate ground of this situation and ask how badly a function 
can fail to be continuous and still be integrable. Though the main result 
(Theorem 21.6) is extremely important, its complete proof is beyond the scope 
of this book. Our goal at this time is mainly to observe how a simple result with 
a simple proof can blossom into something much deeper. 


THEOREM 21.1: Jf the bounded function f : [a, b] — R is continuous except at 
one point, then fis integrable. 


PROOF: Suppose fis discontinuous at © © (4) (the proof must be adjusted if c 
is a or b) and that |f(x)| < M for all x. Let e > 0 be given. Choose a number 7 > 0 
so that l¢-7,¢+"| © (4,6) and My < ¢/12 (the reason for this choice will be 
evident shortly). 


a c—n ct+n b 


Since fis continuous on [a, c — y] and [c + 4, b], it is integrable on both of these 
intervals. Thus there are partitions P, and P, of [a, c — y] and [c + 9, 5], 


respectively, so that 


U(f,Pi)—L(f,Pi)<e/3 and U(f,Po)— L(f,P2) <¢/3. 


Over [c — y, c + y] (whose length is 27), we have sup f— inf f< 2M. Note that 
(after an appropriate renumbering), ? = /: U P2 is a partition of [a, b]. Then we 
have 


U(f,P) — L(f, P) 
U(f, Pi) — L(f, Px) + U(f, Po) — L(f, Po) + (2M)(2n) 


< €/3+e¢/3+e/3 


and so fis integrable by Theorem 17.8. = 
The proof of Theorem 21.1 can be modified to yield: 


COROLLARY 21.2: /f f: [a, b] — R is continuous except at finitely many 
points, then f is integrable. 


PROOF: Left as Exercise 21.1.2. = 


EXERCISES 21.1 
1. (a) Draw a picture to illustrate the construction of the partition P in the proof 
of Theorem 21.1. 


(b) The sketch that accompanies Theorem 21.1 shows a function with a jump 
discontinuity. Is this used anywhere in the proof? Is the proof valid if the 
discontinuity is of the second type? 


2. Prove Corollary 21.2. 
21.2 SETS OF CONTENT ZERO 


The proof of Corollary 21.2 hinges on the fact that we can enclose the points of 
discontinuity of fin a finite collection of intervals whose total length is as small 
as we wish. We give this property a name: 


DEFINITION 21.3: The set S is said to have content zero if for any ¢ > 0 there 
is finite collection of open intervals {J,,}, the sum of whose lengths is less than ¢ 


and such that 5 S Un Jn, 


EXAMPLES 21.2: 1. Any finite set has content 0. If ¢ > 0 is given and the set 


has n elements, we can enclose each element in an interval of length é/(n + 1). 


2.4 = {l.9.3:--} has content 0. If ¢ > 0 is given, let J; = (-é/4, ¢/4). Note that J; 


contains all but finitely many of the elements of H and has length ¢/2. We can 
cover the rest of the elements of H with intervals of total length less than ¢/2 as 
in Example 1. 


3.You will show in Exercise 21.2.1 that the set of rational numbers in the interval 
[0, 1] does not have content 0 even though, like the set H in the previous 
example, it is countable. Whether a set has content 0 depends on the way it is 
distributed on the number line as well as on its cardinality. This is made even 
more apparent in the next example. 


4.The Cantor set (which is uncountable) has content 0. We observed that the sum 
of the lengths of the open intervals removed in the construction of the Cantor set 
is 1. If¢ > 0 is given, we can stop the process when the sum of the lengths of the 
(finitely many) intervals removed to that point is greater than 1 — ¢/2. The Cantor 
set is contained in a finite collection of closed intervals, the sum of whose 
lengths is less than ¢/2. These intervals may be enclosed in turn in open intervals, 
the sum of whose lengths is less than e. 


The proof of the next theorem is almost the same as that of the last, and so we 
will gloss over the details. 


THEOREM 21.4: Suppose the bounded function f : [a, b] — R is continuous 
except on a set of content zero. Then fis Riemann integrable. 


PROOF: Suppose |f(x)| < for all x and let e¢ > 0 be given. Enclose the points of 
discontinuity of f in open intervals of total length less than «/6M. The 
complement of the union of this collection of intervals is a union of finitely 
many closed intervals on each of which fis integrable. Find partitions of each of 
these intervals whose corresponding upper and lower sums are close together. 
Assemble these partitions and the original collection of intervals into a partition 
of [a, b] whose corresponding upper and lower sums are close together. = 


EXERCISES 21.2 


1. Show that 2 |0, 1] does not have content 0. 


2. Supply the details for the proof of Theorem 21.4. 


3. We may say that a subset of the plane has "two-dimensional content 0" if it 
can be contained in finitely many open rectangles, with sides parallel to the 
coordinate axes, whose total area is as small as we like. 


(a) Draw a picture illustrating this. 


(b) Show that the graph of a uniformly continuous function whose domain is 
a compact set has content 0. 


(c) Does the result in (b) remain true if the assumptions of uniform continuity 
or compactness are dropped? (What if the function is continuous but the 
domain is not compact? Or if the function is continuous but not uniformly 
continuous?) 


(d) Show that the graph of a Riemann integrable function has content 0. 
(e) Is (d) a consequence of (b)? Vice versa? 


(f) Show that the graph of the Dirichlet function on [0, 1] (see Chapter 19) 
has two dimensional content 0 according to this definition even though the 
Dirichlet function is not Riemann integrable. 


(g) The top half of the graph of the Dirichlet function seems to be a copy of 
the rational numbers, but we saw that 21.1] does not have content 0. 
Explain. 


(h) Reconsider these questions with the definition of "two-dimensional 
content 0" taken to be "can be contained in finitely many open rectangles, 
with sides parallel to the coordinate axes, the sum of whose diameters is as 
small as we like." (The diameter of a rectangle is the length of a diagonal.) 

(1) Reconsider these questions yet again, with the definition of "two- 
dimensional content 0" taken to be "can be contained in finitely many sets (of 
any sort), the sum of whose diameters is as small as we like." [The diameter 
of a nonempty set is the supremum of the collection of distances between 


points in the set: ?(S) = supid(r,y):,y © S} where d(x, y) is the distance 
measurement appropriate to the setting. ] 


4. Show that the diameter function D defined in the previous problem has the 
following properties: 


(a) D(S) = 0 for all S. 
(b) D(kS) = |A|D(S) (see Exercise 5.2.12 for the definition of S). 


(c) If S CT, then D(S) < D(7). 
(d) Give an example of a nonempty set whose diameter is 0. 
(e) Show that it is consistent with these properties to set D(”) = 0. 


(f) Show that it is not consistent with these properties to assign any value 
other than 0 to D(). 


(g) On the other hand, the definition of D given in Exercise 21.2.3.1 suggests 
that D(Y) = —oo. Explain. 


(h) Can any general statements be made about the diameters of unions and 
intersections? 


5. (a) The quantity called "two-dimensional content 0" in part (i) of the 
previous problem is also called "linear content 0." Discuss why this is an 
appropriate term. 


(b) Suppose S'is a subset of the real line, considered as the horizontal axis in 
the plane. Show that S has content 0 (as a subset of R) if and only if S has 
linear content 0. 


(c) Show that the result in (b) is not true if we use the definition of "two- 
dimensional content 0" from the first part of the previous problem. In fact, 
show that every bounded subset of the real line has two-dimensional content 
0 under that definition. 


21.3 SETS OF MEASURE ZERO 


Theorem 21.4 is not the final word on this topic. In Exercise 21.3.7 you will 
examine a function that is discontinuous precisely on the set of rational numbers, 
yet is integrable nonetheless. In Definition 21.3, the collection of intervals used 
must be finite, and this was necessary to make our examples and proofs work. 
Unfortunately, it obscures the real issue of integrability. 


DEFINITION 21.5: The set S is said to have measure zero if for any ¢ > 0 
there is a countable collection of open intervals /,, the sum of whose lengths is 


less than ¢ and such that 5 © U, Zn, 


Notice that the sum of the lengths of the intervals in Definition 21.5 is actually a 
series 1f the collection of intervals is infinite. The requirement of countability in 
this definition does not restrict our activities much. The sum of any uncountable 


collection of positive numbers is infinite (see Exercise 13.2.4), and so if the sum 
of the lengths of the intervals is less than ¢, only countably many of them are 
nonempty. Furthermore, recall that any union of open intervals is an open set and 
that an open set can be written as a union of countably many open intervals 
(Theorem 8.11). 


EXAMPLES 21.3: 1. Enumerate the rational numbers: 7), 75, .... Let ¢ > 0 be 
given and let J, = (r,, — ¢/2”*, r,, + e/2"*?), Then 2° U,/n, the length of I, is 
</2”™*!, and the sum of the lengths of the intervals J, is ¢/2 < e. Thus the set of 


rational numbers has measure zero (it is small in this sense despite the fact that it 
is dense). This proof can be modified to show that any countable set has measure 
zero. 


The following theorem does tell the whole story of Riemann integrability, but its 
proof would take us too far afield. The proof of Theorem 21.4 relied on the fact 
that the complement of a union of finitely many intervals is again a union of 
finitely many intervals. This is not so if the former collection is not finite. (The 
Cantor set is the complement of a union of countably many intervals, but 
contains no interval.) 


THEOREM 21.6: 4 function f : [a, b] — Ris Riemann integrable if and only if 
the set of points at which f is discontinuous has measure zero. = 


EXERCISES 21.3 


1. Show that any set of content 0 also has measure 0. 


2. Find an irrational number not in the union of the intervals used to show that 
the rational numbers have measure zero. 


3. Show that any countable set has measure zero and is of first category. 

4. Show that any subset of a set of measure zero has measure zero. 

5. Show that a countable union of sets of measure zero has measure zero. 

6. (a) We might define the length of an open set to be the sum of the lengths of 


the open intervals in the representation of the set given in Theorem 8.11 (this 
might, of course, be infinite). Show that the length of a union of open sets is 


not greater than the sum of the lengths of the sets. 


(b) Show that the length of a union of disjoint open sets is equal to the sum of 
the lengths of the sets. 


(c) Show that a set has measure 0 if and only if, given ¢ > 0, it can be 
contained in an open set whose length is less than e. 


7. Define a function on [0, 1] by f(t) =1/nife=m /n € Q is in lowest terms 
and f(x) = 0 if * € Q. Show that fis continuous at each irrational number and 
discontinuous at each rational number. By Theorem 21.6, this means that fis 


integrable. Find its integral. 


8. We have seen several ways in which a set can be considered to be small. 
Cardinality, category, content, and measure all provide us with ideas of 
smallness. Discuss the relationships among these concepts. For instance, a 
countable set must have measure zero and also must be of first category but 
need not have content zero. On the other hand the fat Cantor set is a set that 
is of first category whose measure is not zero. An examination of all possible 
combinations is a major undertaking. 


21.4 ASPECULATIVE GLIMPSE AT MEASURE THEORY 


The idea of measure 0 is a hint of the much larger topic of measure theory. One 
of the first big surprises of measure theory is that, while it is very easy to define 
"measure 0," it is very difficult to define "measure." Even so, we can guess at 
some of the ideas of measure theory by thinking about a topic with considerably 
more intuitive familiarity: Probability. 


If we select an element of [0, 1] at random (though it is not at all clear what 
this means or whether it is possible in any practical sense), there is, it seems, a 
probability of 1/2 that it will be in [0, 1/2] because the length of [0, 1/2] is half 
that of [0, 1]. If a subset of [0, 1] 1s open (so its length may be computed as in 
Exercise 21.3.6), the probability that a randomly chosen element is in that set 
should be the length of the set.' If two open sets are disjoint, the length of their 
union is the sum of their lengths. This might lead us to make the following 
guess. Unfortunately, we aren't prepared to decide whether this guess is true or 
not. That's why it's a guess! Sorting this out is one of the first major issues in the 
study of measure theory. We will say something in a moment that will cast 
doubts on the whole enterprise. 


GUESS 21.7: [fA 8B = 9, the probability of selecting an element of AUB is the 
sum of the probability of selecting an element from A and the probability of 
selecting an element from B (in other words, the measure of a disjoint union is 
the sum of the measures of the two sets). 


If one set is contained in another, the probability of selecting an element of the 
smaller one should be no larger than that of selecting an element of the larger 
one. The definition of "measure 0" is that the set in question can be contained in 
an open set of arbitrarily small length. This leads to the following (correct) 
guess: 


GUESS 21.8: Jf a number is selected from [0, 1] at random and § © \9.1\ has 
measure 0, the probability that the element is in S should be 0. 


(if this guess is correct, the probability that a randomly selected number is 
rational is 0.) We have been very conservative in Guess 21.8 since we know the 
meaning of "measure 0." We might have been lead by the preceding discussion 
to make the following incorrect guess: 


GUESS 21.9: Jf AS 8 (0.1), then the probability that a randomly selected 
element is in A is not larger than the probability that it is in B. 


How could this fail to be true? We now come to the only real result of this 
section.” This theorem, as we have said, throws a shadow over any guesses we 
might make concerning probability or measure and warns us that the subject is 
not easy. We will assume that Guess 21.7 holds for countable unions under 
"suitable conditions" (what these conditions might be, we can only guess). 


THEOREM 21.10: Under the assumptions mentioned above, there exists a 
nonmeasurable set (a set to which it is impossible to attach a probability). 


PROOF: We will gather the elements of the interval [—1/2, 1/2] into a collection 
of classes in this way: Two numbers will be in the same class if they differ by a 
rational number (for instance, all the rational numbers in the interval are in one 
class, z/9 is in a different class, and z/9 + 1/37 is in the same class as z/9). These 
classes are clearly disjoint, and each of them is countable. There must be 
uncountably many of these classes since their union is the uncountable set [—1/2, 
1/2]. Let B be a set that consists of one element chosen from each of these 
classes. We will show that 8 can't be measurable. Each element of [—1, 1] differs 


from some element of [—1/2, 1/2] by a rational number in [—1/2, 1/2]. It follows 
that the union of translations’ of 8 by rational elements of [—1/2, 1/2] is [-1, 1]. 
Each of these translations should have the same measure as 8. If the measure of 
B is O, then this (countable) union must also have measure 0. But [—1, 1] doesn't 
have measure 0 (its measure would seem to be 2). Perhaps 8 has some positive 
measure, say /. Then it must be that 2 = { + 6 + B+ ..., which is impossible. 
This contradicts the assumption that the rules of measure theory (which we have 
only guessed at!) apply to this set 8. We must conclude that 8 is nonmeasurable. 


With Theorem 21.10 in mind, we can be a little more cautious and update Guess 
21.9 (which was incorrect) to the following (which 1s correct): 


GUESS 21.11: Jf 4 © 8 and both A and B are measurable (if a probability can 
be assigned to them), then the measure of A is not more than the measure of B. 


Thinking about probability some more, if we specify a subset of [0, 1] and 
choose an element at random, it seems that the probability that the chosen 
number is in the set should be 


[1 — (the probability that it is not)]. 
This leads us to the following (correct) guess: 
GUESS 21.12: [fa set is measurable, its complement is measurable. 


Now we are really getting somewhere. We suspect very strongly that open sets 
are measurable and that we can find their measures (Exercise 21.3.6). Guess 
21.12 would mean that closed sets are also measurable. How would we find the 
measure of a closed set? A closed set is not necessarily a union of intervals like 
an open set, and so that method won't work. Suppose for the moment that our 
closed set can be contained in some interval (a, b). Then its measure should be 


[(b — a) — (the measure of its (open) complement)]. 


This is well and good, but not every closed set (nor every measurable set) is 
bounded. Notice that we do have all the machinery needed to check the property 
in the next definition. 


DEFINITION 21.13: The set S is essentially bounded if there is an open, 
bounded interval (a, b) so that S\(a, b) has measure 0. 


GUESS 21.14: (a) A set that is not essentially bounded is either nonmeasurable 
or has infinite measure. 


(b) If S is an essentially bounded, closed set and (a, b) is as in the definition, the 
measure of S is (b — a) — [the measure of (a, b)\S]. 


Now things are beginning to come together. In Exercise 8.3.2 we saw how we 
can approximate a set from the inside with an open set (its interior) and from the 
outside with a closed set (its closure). In view of Guess 21.11, such an 
approximation would give us an estimate of the measure of a set. Unfortunately, 
this is not what we want, for a very simple reason: It doesn't work! The closure 
of 0, 1/9 Q is [0,1] js [0, 1]. We know that °: 11° @ has measure 0, but its closure 
seems to have measure |. This is not a very good approximation. The trick is to 
turn "interior" and "closure" on their heads. 


DEFINITION 21.15: (a) The outer measure of a set S is the infimum of the 
measures of all open sets containing S. 


(b) The inner measure of S is the supremum of the measures of all essentially 
bounded, closed sets contained in S. 


(c) A set is measurable if its inner measure and outer measure are the same, in 
which case the measure of the set is this common value. 


Now this has a nice familiar ring to it. Not only does it look a lot like many of 
the things we've done in the past, but the computation of an outer measure is 
very much like the process by which we determine that a set has measure 0. You 
stand poised and ready to leap into the study of measure theory in much the way 
that you were ready to leap into real analysis when you began this text. You 
begin this journey in the company of an old friend, a variant of the Cauchy 
criterion. 


THEOREM 21.16: A bounded set S is measurable if and only if given ¢ > 0 
there is a closed set A and an open set B such that A © 5 © B and the difference 
between the measures of A and B is less than €. = 


EXERCISES 21.4 


1. (a) Suppose that A and B are measurable sets with the property that there are 
disjoint open sets C and D with ACC and BC VD. (This is stronger than 
simply saying A and B are disjoint.) Show that the measure of AU B is the 


sum of the measures of A and B. 
(b) Give examples of disjoint sets that don't have the property in (a). 
(c) Discuss Guess 21.7. 


2. (a) Several assumptions about measure and probability are made in the proof 
of Theorem 21.10. Find them and discuss whether they are reasonable. 


(b) Verify that every element of [—1, 1] differs from some element of [—1/2, 
1/2] by a rational element of [—1/2, 1/2]. 


(c) A really big assumption about set theory was made in the proof of 
Theorem 21.10. It is found in the statement "Let B be a set that consists of 
one element from each of these sets." Discuss what is being assumed here 


and whether it is reasonable.* 


3. Consider the set given by 8 = Uni is" + ze] 
(a) Show that S is closed. 
(b) Show that S is not essentially bounded. 
(c) Explain why it is reasonable to say that the measure of Sis 1. 
(d) Prove that the measure of S is 1, using Definition 21.15. 
(e) Discuss Guess 21.14.a. 


4. Prove Theorem 21.16. 


5. (a) Suppose S has content 0 and that f: R — R is a differentiable function 
whose derivative is bounded. Show that f(S) has content 0. 


(b) Show that this remains true if f is only assumed to be absolutely 
continuous (see Exercise 17.7.7). 


(c) Show that (a) is true with "content" replaced by "measure." 


(d) Let K be the Cantor function (constructed in Chapter 19) and C the 
Cantor set (which has measure 0). Show that K(C) = [0, 1], and consequently 
that the continuous image of a set of measure 0 can have positive measure 
(small can be continuously deformed into big!). What does this say about the 
derivative of the Cantor function? What does this say about the possibility 
that the Cantor function is absolutely continuous? 


6. (a) Recall the brief look at Lebesgue integration we took in Exercise 17.7.8. 


Show that the Dirichlet function is Lebesgue integrable, and that its integral 
is 0. 
(b) Assuming for the moment that any Rieman integrable function is also 


Lebesgue integrable, discuss whether a function that differs from a Riemann 
integrable function on a set of measure 0 is necessarily Lebesgue integrable. 


! To be more precise, this probability is the length of the subset divided by the length of the whole set 
from which the number is chosen. 


2 Keep in mind that the object of these selected shorts is to explore how bad things can get! 


3 Recall that the translation of a set S by a number x is denoted S + x and is given by: 
S+2={y:i4se Sa(y=s+z2)} 


4 WARNING!! There is much more involved here than meets the eye. If you convince yourself that it is 
always possible to select an element from each of a collection of nonempty sets (without the use of some 
formula), read "The Banach-Tarski paradox" [Karl Stromberg, American Mathematical Monthly 86 (1979), 
151-161]. The third paragraph alone will be enough to shake your confidence. 


Chapter 22 


We Build the Real Numbers 


22.1 DO THE REAL NUMBERS REALLY EXIST? 


We have spent a good bit of time thinking about the field of real numbers and 
have seen six manifestations of its property of completeness, which distinguishes 
it from other ordered fields. What we have not done is show that it exists! None 
of what we've done has anything to say about the question: 


Is there an Archimedean ordered field that has the Least Upper Bound property? 


In this chapter, we will show that such a things exists in the most convincing 
way possible: We will build one. Here is what we must do: 


(1) Define the phrase "real number." 
(2) Define addition and multiplication. 
(3) Show that this structure is a field. 
(4) Define a positive set. 

(5) Show that the field is complete. 


22.2 DEDEKIND CUTS 


DEFINITION 22.1: A subset C of the rational numbers is called a Dedekind 
cut (or simply a cut) if: 


4) C#Q and CF %. 
(ii) if P © © and q < p, then € ©, 
and (iii) if p © C, there isa g > p with 9 © ©. 


EXAMPLES 22.2: 1. It is easy to check that {p © Q: p < 1/4} is a cut and that 
{p © Q:p < 1/4} ig not [condition (iii) is violated]. 

2. A={pEQ:p>0and vp? < 2}U{pEQ: p< 9} ig a cut. 

(i) 0€ A,andso A#%:2¢A andsoA#Q. 


(ii) Suppose P © A and qg < p. Ifq <0, then” © A. If g > 0, then 0 < gq < p, and so 
0<q?<p*<2and7EA, 

(iii) Let p © A, If p < 0, then P< 4¢=1€ A, Suppose p > 0. As in the proof of 
Theorem 5.4, we may find a positive rational number, r, with (p + r)* < 2. Then 
gq =p +r fulfills the definition. (If p is rational, so is r = (2 — p*)/7.) 


For any rational number p, let ?” = {9©Q:4<P}. It is clear that p* is a cut. 
These are called the rational cuts. The first set in Example | is a rational cut, 
but even though the set A in Example 2 is a cut, it is not a rational cut. We will 
see shortly that A is actually v2. 


EXERCISE! I: Show that if C is a cut, p © C, and” € ©, then p < r. (Remember 
this as Anything outside a cut is larger than anything inside.) 


EXERCISES 22.2 
1. Show that the union of any set of cuts is either a cut or is all of Q. 


2. (a) Check that the reference to the proof of Theorem 5.4 is valid. 


(b) Show that if p is a rational number, then so is r = (2 — p*)/7. 


22.3 THE ALGEBRA OF CUTS 


DEFINITION 22.2: If A and B are cuts, we define their sum A + B by: A + B= 
{r:r=p+q for some Pp © Aandg@ € BP}. 


THEOREM 22.3: [fA and B are cuts, then A + B is a cut. 


PROOF: (i) A and B are nonempty, and so A + B is nonempty. Let pg and go be 
in the complements of A and B, respectively. By Exercise I, pg > p and qq > g for 
all P© A and 9© 8, Then py + gg > p + q for all such p and q, and so 
Po+g €A+B, Thus4+BF#Q. 

(ii) Let r€ A+B,s<r, s < r, and d=r-s€Q, Since r¢ A+B, there are 
elements p and g of A and B with r = p + g. Now §="~@= (p-d) +g A+B 
since p—d€ A, 

(iii) Letr =p+q€ A+B, andp<po€ A, Then’<Pot+qge A+B y 


THEOREM 22.4: Addition of cuts obeys rules (1) through (4) of the definition 
of a field. 


PROOF: Addition is clearly commutative and associative. We might guess that 
0* acts as a zero element (this cut was defined just before Exercise I). We must 
show that A + 0* = A for any cut A. Here we begin to see the usefulness of 
having constructed the real numbers as we have. We must establish equality of 
two sets, and so we have all of those techniques available to us. If Pp © A and 
z € 0*, thenz <0, and sop +z<pandp+ © A, Thus A+0* © A, To establish 
the other inclusion, we must show that any element of 4 can be written as the 
sum of an element of A and a negative number (an element of 0*). Let p © A and r 
> 0 with ¢=p+reA (be sure you see why such an r exists). Then 
p=q+(-r)€ A+0", and so AS A+0*, (The proof is not yet complete, but we 
return now to the discussion.) 


The question of additive inverses is more delicate. We must define —A so that A 
+ (—A) = 0*. In other words, the sum of an element of A and an element of —A 
must be negative. We might make a guess: 


222? -A={peEQ:p+q<O0forallge A} 7??? 


But this doesn't work. If this were our "definition," we would have 
(1*) = {p€ Q:p<-1} which is not a cut. 

It seems reasonable to expect that —(1*) = (—1)* (the set above without the 
element —1). Perhaps we can adjust our definition to exclude —1. Does —1 
behave differently in this construction than other elements? When we "add" —1 
to 1*, we get: -1 + {p: p< 1}! = {p: p< 0}. On the other hand, if g < —-1, we 
have g+ {p:p<1}={p:p<1+4q}. Note that 1 + g <0. For all elements p in 
{p © Q: p< —1} except —1, the supremum of p + 1* is negative, where for —1 the 
supremum is 0. Evidently —(1*) should consist of all g such that sup(g + 1*) < 0. 
We have to work around the fact that we can't really say "supremum" yet. 


DEFINITION 22.5: For a cut A, let —A be defined by 
—-A={péQ: dry < 03Vq € A(p+q < rp)}. 
THEOREM 22.6: /fA is a cut, —A is a cut. 


PROOF: (i) Suppose 2” ¢ 4 and let p = —p* — 1. For any 7 © 4 we have g + p = 


gq — p'-1<-1 since p* > q. So P © ~A (we may take r, = —1), and so —A # 0. If 
q © A, then 4 € ~A (since g + —qg = 0), and so -—4 #Q. 

(ii) Let P © ~A andr, < 0 be as in the definition. If py <p, then pp +g <p+q< 
r, for all 7 € A, so that Po € ~A (that is, we can take "po ="). 


(111) Let p and r, be as above and let p, = p — (1/2)r,. Note that p, > p since r, < 


0. For any 7 © 4, we have 


q+p1 =q+p-—(1/2)rp < rp — (1/2)rp = (1/2)rp, 


and so we can take»: = (!/2)r» to show that Pi © ~A. = 
THEOREM 22.7: For any cut A, we have A + (—A) = 0°. 


PROOF: Let? © Aand Pp © ~A, Then —p > q since ~P € A So q + p< 0; that is, 
qg+pe€0*, and A+—A € 0°, The other inclusion is trickier. Let P © 9°. We must 
write p as the sum of an element of A and an element of —A. Let r = —(1/2)p. 
Now r > 0, and so there is an integer n so that nr € A but (n + lr ¢ A (since the 
rational numbers have the Archimedean property). Let s = —(n + 2)r. Then 
s € —A since for any 7 © A we have: 


< OU 


[since (n+1)r¢A, it is larger than gq]. Finally, p="7+s©€A+-—A, and so 
0* C A+ —A, wu (This also completes the proof of Theorem 22.4.) 


We have found an additive structure for cuts. We now turn our attention to 
multiplication. When we were young, we were taught first how to multiply 
positive numbers, and then we were shown how to find the sign of a product if it 
involved negative numbers. (We were never really taught multiplication for 
negative numbers!) We will use the same approach here, but we must be a bit 
careful because "positive" doesn't have any meaning to us yet. 


DEFINITION 22.8: Let 4 and B be cuts such that 0 € AN B. We define their 
product AB by 


AB = {pq:p€ A,q€ B,p>0, and gq >0}U{p: p< 0}. 


[By part (i111) of the definition of a cut, any cut that contains 0 also contains 
positive rational numbers. These cuts will ultimately become the positive real 
numbers. | 


THEOREM 22.9: [fA and B are cuts with 0 © AN B, then AB is a cut. 


PROOF: (i) Follows as in the proof for addition. 

(ii) Since {P : P < 9} S AB hy definition, we need only check the case where 
r= pq © AB with p, q>0, and 0 <s <r. Write s = (s/r)r = [(s/)plq. Now s/ < 1, 
and so P > (s/r)p © A, Since 7 © B, we have written s as the product of a positive 
element of A and a positive element of B, and so s € AB. 

(iii) Let re AB. If r < 0, then for any 9<pP€A and 9<4€ B, we have 
r<O<pqge AB ifr = pq, where O<pEA and 0<qéE B, and p< po € A, we 
have " < poq € AB, g 


Multiplication (when it is defined) is clearly associative and commutative. To 
extend the definition of multiplication to all cuts, we first prove the following, 
which will become the trichotomy when we define the positive set: 


THEOREM 22.10: For any cut A, exactly one of the following holds: 


(i) OE A; 
(ii) A= 0*; 
or (iii) O€ —A. 


PROOF: We will show first that no two of these can happen at the same time. 
This is clear for the pair (1) and (ii) and the pair (11) and (111) (since 0* is its own 
additive inverse). Suppose that (i) and (111) both occur for some cut A. Then 
0 =0+0¢A+-—A = 0", which is a contradiction since 9 ¢ 9". Now we show that 
one of these must occur. Suppose that both (i) and (11) fail for the cut A. Since 
0 € A, there is some p < 0 with P ¢ 4, For any 7 © 4 we then have g + 0=q < p< 
0, so that 0 <¢ —A. = 


DEFINITION 22.11: For cuts 4 and B, let 


AB (as above) if0€ ANB 
—(A(-B)) if0¢€ A\B and B#0* 
(—A)(-—B) if0 ¢ AUB and A, B # 0* 

0* if A= 0* or B= 0". 


Notice that the products that are computed in this definition involve only cuts 
that contain zero. Since we have already established that the product of two such 
cuts is a cut, and that the additive inverse of a cut is a cut, we need not establish 
these results again. The definition of the multiplicative inverse and the 
associated proofs are much like those for the additive inverse and are omitted. 


THEOREM 22.12: The set of cuts is a field. = 
DEFINITION 22.13: The real numbers: ® = {4 © Q: A is a cut}, 
EXERCISE II: Let 4 and B be cuts. Show that either 4 © Bor BCA, 


This is more significant than it appears. It will play an important role in the 
definition of the ordering on the real numbers. 


EXERCISES 22.3 
1. Show that —(1*) = {p <—1} under the "guessed" definition. 
2. Verify that addition and multiplication are associative and commutative. 


3. (a) Show that 1* + 2* = 3* and that p* + g* =(p + q) for any p and gq. 


(b) Show that p*g* = (pq) for any rational numbers p and gq. (If you have 
taken algebra, you recognize that this exercise says the rational numbers are 
isomorphic to the rational cuts.) 


4. Show that the cut 4 in Example 21.1.2 is V2; that is, show that 4A = 2*. 
(Caution! The elements of AA are not just the squares of elements of 4.) 


22.4 THE ORDERING OF CUTS 
DEFINITION 22.14: The positive set of real numbers is given by 
P={AER:0€ A} 


We have already established the trichotomy. The other properties of the positive 
set are easy to show. 


EXERCISE III: Show that 4 < B if and only if 4 © 2. 


This highlights the dual nature of the real numbers and establishes a geometric 
fact you probably suspected already. The real numbers are elements of an 
ordered field, but they are also sets. 


EXERCISES 22.4 


1. Verify that P is a positive set. 
22.5 THE CUTS ARE THE REAL NUMBERS! 


All the preceding work leads to the following, which says that the set of real 
numbers, as defined in this chapter, has one of the properties of the Big Theorem 
(and hence have them all). Note well: This is now a theorem, not an axiom. 


THEOREM 22.15: R has the Least Upper Bound property. 


PROOF: Let 5 = {4a : @ © A} be a nonempty set of real numbers that is 
bounded above. (Other than the requirement that A # !), the nature of the index 
set A is unimportant.) Since S is bounded above, there is a real number A so that 
A, < A for all a¢ A, By Exercise II, this means 4a € A for all a € A. Let 


B= UaeaAo, It is easy to see that B is a real number and that it is an upper 
bound for S. Suppose C is an upper bound for S; that is, 4a S © (Aa © ©) for all 
a. Then 2 = Uses Ae S ©; in other words, B < C. So B is the least upper bound 
of S. = 


The brevity of this proof is a good example of an important pattern in 
mathematics. The work in this chapter has gone into picking the right definitions. 
Having done this well, the important theorems can be proved easily. 


EXERCISES 22.5 


1. Since R is linearly ordered, we may define cuts of real numbers (the 
definition is the same as for cuts in Q). 


(a) Show that if C is a cut of real numbers, then C is bounded above and 
C = {x © R: & < supC}, An ordered field F in which every cut is of the form 
{© € F : 2 < a} for some a € F is said to have the Dedekind property (note 
that Q does not have the Dedekind property). 


(b) In (a) you showed that the Least Upper Bound property implies the 


Dedekind property. (The Least Upper Bound property guarantees that C has a 
supremum.) Now show that any ordered field with the Dedekind property 
also has the Least Upper Bound property. (That is, the Dedekind property is 
part of the Big Theorem.) 


. Suppose 5 © R is bounded above. Let 
U ={y:y< 2x for some z € S}. 
Show that U is a cut and that sup S = sup U. 


. The phrase "limit superior" has been defined three times in this book for 
three different sorts of objects: Collections of sets (Exercise 1.15.9), 
individual sets (Exercise 7.6.6), and sequences of numbers (Exercise 
10.4.10). Use the definition of a real number as a Dedekind cut to relate 
these definitions to each other. For example, if each term in a sequence (x,) 


is considered to be a cut, is the limit superior of (x,,) (a collection of sets) the 
same as the limit superior of (x,,) (a sequence of numbers)? 


. We could also construct the real numbers from Cauchy sequences of rational 
numbers. The first problem we encounter is that a number can be the limit of 
many sequences. If X = (x,) and Y = (y,), recall that we have defined the 


weave of X and Y to be X W Y = x1, yj, X2, Vo, .-. (See Exercise 10.4.13). Let 


us say the sequence X is equivalent to the sequence Y if X W Y is a Cauchy 
sequence. Show that this is an equivalence relation (see Exercise 2.1.5). Now 
let R be the collection of equivalence classes of such sequences and follow 
the outline of this chapter to show that R, so defined, is an Archimedean 
ordered field in which the Cauchy criterion holds. In view of the Big 
Theorem, all the other properties of the real numbers discussed in Part 2 of 
the book also hold for this field. Note that if one takes this approach, the 
Archimedean property must be established separately, since it is not implied 
by the Cauchy criterion. 


. Here we investigate whether the field in Exercise 22.5.4 is really the same as 
the one constructed in this chapter. 


(a) Show that any field with the Least Upper Bound property must contain a 
copy of the real numbers. (We've seen before that any ordered field contains 
a copy of the rational numbers.) 


(b) Find a part of the Big Theorem that must be violated if such a field has 
any elements other than those found in (a). 


(c) Does this answer the question at the beginning of the exercise? 


. In this chapter, we constructed the real numbers from a much more familiar 
set, the rational numbers. In this exercise we will construct the integers from 
the (more familiar) natural numbers. In the next exercise, we will construct 
the rational numbers from the (more familiar) integers. We will assume that 
we know how to (i) add two natural numbers, (i1) recognize when one natural 
number is larger than another, and (iii) subtract a smaller natural number 
from a larger one (but that, in general, we do not know how to subtract 
natural numbers). Consider the set consisting of the symbols 0, (+, 1), and 
(—, n) forn = 1, 2, .... (In the end, we will think of the symbols (+, 7) as the 
positive integers, and the symbols (-, 7) as the negative integers.) 


(a) Define addition and multiplication of these objects. Remember that your 
definition can involve operations defined only on natural numbers. 


(b) Show that the addition and multiplication you have defined are 
commutative. 


(c) Show that every element of this set has an additive inverse. 
(d) Show that multiplication distributes over addition. 
(e) Show that not every element of this set has a multiplicative inverse. 


(f) Define an ordering on this set. (The set of positive elements—those that 
are greater than 0 in your ordering—should have the same properties as the 
positive set in an ordered field: It should be nonempty; the sum of two 
positive elements should be positive; the product of two positive elements 
should be positive; and the trichotomy should hold.) 


. Here we construct the rational numbers from the integers. 


(a) Consider the set of ordered pairs (p, g), where p and q are integers and g # 
0. While we might want to consider (p, g) as being the same as the "quotient" 
p/q, we can't do so. Why? Consider the (different) pairs (1, 2) and (3, 6). 


(b) Define a relation ~ between these pairs by (p, g) = (m, n) if pn = qm. 
Show that this is an equivalence relation (see Exercise 2.1.5). 


(c) Denote the equivalence class of the pair (p, g) by [p, g]. We will define 
addition of these equivalence classes by 


Ip, q| + [m,n] = [pn + gm, qn]. 


Show that this operation yields a valid equivalence class. 


(d) Show that the operation defined in (c) is well-defined, that is, if (p, g) = 
(7, s) and (m, n) = (j, k), then [p, g] + [m, n] = [7 s] + [/, Kk]. ["Well-defined" is 
a deceptive phrase. It appears to mean something very general when, in the 
mathematical sense, 1t means something very specific: Whenever you define 
an operation on equivalence classes by referring to representatives of those 
classes (as we have done here), you must ask whether the result would be the 
same if you had begun with a different representative of each class. ] 


(e) This collection of equivalence classes of ordered pairs of integers is what 
we will call the rational numbers. Define multiplication on this set and 
show that the resulting structure is a field. 


(f) Define a positive set on the rational numbers and show that the result is an 
ordered field. 


8. Define "ordered pair" without using any of the words "first," "second," "left," 
"right," or any other "directional" indicator. Whatever your definition is, it 
has to be the case that (a, b) =(c, d) if and only if a = b and c =d. 


| Exercises I, II, and III, which are in the text of this chapter, play such an important role in the 
development that they should be done as they appear. 


2 The proof of Theorem 22.4 extends all the way to the end of the proof of Theorem 22.7. We will need 
some intermediate results to complete it. 
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in one index 305 
norm 245 


Union 26, 28 
Unit ball in a vector space 280 
Universal quantifier V 13 
Universe 14, 15 
Upper 
bound 80 
Riemann integral 283 
Riemann sum 282 


Vacuously true statements 7 
van der Waerden's function 332 
Variable 13 
Variation 300 

function 301 
Venn diagrams 18 
Verbs, need for 4 


Weave 170 
Weierstrass M test 256 
Well-defined 359 
Well-ordered set 54 

N is 54 


Z,, 45, 47 
not ordered 65 


